fix(ci): resolve main Build & Test failures - pgx CVE, errcheck wave, config test regressions#1343
Conversation
Fixes govulncheck finding GO-2026-5004 (SQL injection via dollar-quoted string constants in the simple protocol) reported by CI on main since 2026-06-11.
golangci-lint v2.10.1 errcheck with check-blank:true and check-type-assertions:true flagged unchecked error returns and single-value type assertions in 29 files. Changes by category: - Defer rollback errors: log instead of blank-assign (cleanup-lambda, rekey, commitmentopts/store_postgres) - Interactive CLI ReadString errors: return error to caller (configure_azure, configure_gcp) - buildResponse: drop unused error return; callers use single-return form; update two test callers accordingly (handler, handler_test, handler_coverage_test) - Intentionally best-effort operations: capture error and log at warn/ debug (handler.go resolveSourceIdentity STS, dashboard resolveAWS*, recommendations_refresh ClearCollectionStarted x2, registrations requireAdmin, ri_utilization_cache singleflight, service_mfa UpdateUser, scheduler errgroup.Wait x2 and GetRecommendations, execution SavePurchaseExecution, credentials resolver LoadRaw, server/http w.Write) - singleflight.Do: use two-value if-init form so error is captured (service_apikeys, ri_utilization_cache) - Type assertions: two-value form with panic on mismatch in mocks (stores, secretsmanager, ses, sns - 57 sites) and test_helpers (auth - 10 sites) - Type assertion: validation.go gcp payload type field - Connection pool: ok-check on sync.Map type assertion (connection.go) - DevKey: panic on invalid compile-time hex constant (cipher.go) - migrate.go: capture version check error before log - testhelpers: log container Terminate error in cleanup path - handler_accounts: add logging import, fail-closed on HasCredential error
…ests Fixes the three integration tests failing on main since 2026-06-11 (issue #1339): - TestPostgresStore_PurchaseExecutions/Get_execution_by_ID_-_not_found expected an error but GetExecutionByID returned (nil, nil) for a missing row. That silent-nil contract violates the fail-loud policy and forced every caller to nil-check. GetExecutionByID now returns an error wrapping config.ErrNotFound; a nil error guarantees a non-nil execution. All callers (handler_purchases, revoke, scoping, purchase approvals/messages, store-internal CAS probes) switched to errors.Is(err, config.ErrNotFound) and their dead nil-checks were removed. Mock fixtures that simulated not-found via (nil, nil) now return the sentinel error. The pgxmock unit test TestGetExecutionByID_NotFound already asserted the error contract. - TestPostgresStore_UpsertRecommendations_AccountScopedEviction and TestPostgresStore_UpsertRecommendations_AmbientAndRegisteredCoexist failed with an FK violation: recommendations.cloud_account_id REFERENCES cloud_accounts(id) (migration 000030) but the tests never created the referenced accounts. Production upserts only carry registered account IDs, so the tests now seed the cloud accounts first via a new seedRecommendationCloudAccount helper.
…r handling
Review follow-ups on the errcheck/test-fix stack:
- commitmentopts.Save: the deferred rollback logged WARN on every
successful Save because pgx Tx.Rollback after Commit returns
pgx.ErrTxClosed. Swallow that single sentinel (same filter as
config.WithTx) so only real rollback failures are logged.
- TransitionExecutionStatus CAS probe: a hard DB error from the
zero-row follow-up probe was mapped to ErrNotFound, making a DB
outage read as a benign race-loss to the purchase reaper. Only
errors.Is(existErr, ErrNotFound) maps to the race-loss sentinel;
other probe errors propagate. New pgxmock regression test asserts a
hard probe error is not ErrNotFound (verified failing pre-fix).
- interfaces.go: document the GetExecutionByID contract on the
interface method (wraps ErrNotFound; never returns (nil, nil)).
- configure-azure/configure-gcp: reader.ReadString('\n') returns
(data, io.EOF) for a final unterminated line, so the new error
returns discarded valid piped input (printf "r" | cudly
configure-azure). New readTrimmedLine helper tolerates io.EOF when
data was read; all interactive reads go through it.
|
Important Review skippedNo new commits to review since the last review. ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughUpdates the pgx dependency, changes execution lookups to return wrapped ChangesCI Red Main Fixes: CVE, errcheck, not-found regression
Estimated code review effort: 4 (Complex) | ~75 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
cmd/configure_azure.go (1)
411-422: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winThread the existing reader into
executeExplicitCommand.
A freshbufio.Readeronos.Stdincan skip bytes already buffered by the caller’s reader, so the retry prompt can lose piped input after earlier prompts consume part of the stream. Pass the existingreaderthrough instead.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/configure_azure.go` around lines 411 - 422, The retry prompt in executeExplicitCommand is creating a new bufio.NewReader on os.Stdin instead of reusing the caller’s reader, which can drop already-buffered input. Update executeExplicitCommand to accept and use the existing reader passed in from the surrounding flow, and thread that reader through the prompt logic so readTrimmedLine consumes from the same stream consistently.cmd/configure_gcp.go (1)
417-428: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winThread the existing reader through
executeGCPCommand. Creating a freshbufio.NewReader(os.Stdin)here can miss buffered piped input and skip the follow-upy/Nresponse; pass the caller’s reader down instead.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/configure_gcp.go` around lines 417 - 428, Thread the existing input reader through executeGCPCommand instead of creating a new bufio.NewReader(os.Stdin) for the y/N prompt. Update the caller and the executeGCPCommand flow so the same reader used by readTrimmedLine is reused for the follow-up confirmation, preserving any buffered piped input. Use the existing executeGCPCommand and readTrimmedLine symbols to locate the prompt handling and replace the local reader creation with the passed-in reader.internal/credentials/resolver.go (1)
408-417: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winTreat
LoadRawerrors as failures, not absence.
CredentialStore.LoadRawreturnsnil, nilonly when no credential exists. Logging the error and continuing can switch a workload-identity-federation account into the federated path when the stored WIF config couldn’t be read, silently changing auth behavior on a real store failure.🐛 Proposed fix
var raw []byte if store != nil { var loadErr error raw, loadErr = store.LoadRaw(ctx, account.ID, CredTypeGCPWIFConfig) if loadErr != nil { - log.Printf("credentials: LoadRaw for account %s: %v (treating as absent)", account.ID, loadErr) + return nil, fmt.Errorf("credentials: check stored gcp credential for account %s: %w", account.ID, loadErr) } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/credentials/resolver.go` around lines 408 - 417, The credential lookup in resolver.Load should not treat CredentialStore.LoadRaw failures as “absent” WIF config. Update the code around the raw/loadErr handling so that only a nil result means no stored CredTypeGCPWIFConfig, and any non-nil loadErr is returned or otherwise surfaced as a real failure instead of falling back to the federated path. Keep the account.ID context in the error path and preserve the existing absence behavior only when LoadRaw returns nil, nil.
🧹 Nitpick comments (4)
internal/mocks/stores.go (1)
5-5: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚖️ Poor tradeoffConsider a generic helper to eliminate the repeated cast-and-panic boilerplate.
The same 5-line pattern (
args.Get(0).(T)→okcheck → formatted panic) is repeated across ~40 methods in this file, plus similarly ininternal/auth/test_helpers.go,internal/mocks/secretsmanager.go,internal/mocks/ses.go, andinternal/mocks/sns.go. A small generic helper would remove the duplication and guarantee message-format consistency (e.g., stores.go/secretsmanager.go/ses.go/sns.go use"mock: expected %s, got %T"while test_helpers.go uses"MockStore.Method: expected %s, got %T").♻️ Example generic helper (requires Go 1.18+ generics support)
func mustCast[T any](v any) T { t, ok := v.(T) if !ok { panic(fmt.Sprintf("mock: expected %T, got %T", *new(T), v)) } return t }Usage:
v := mustCast[*config.GlobalConfig](args.Get(0))This is a broad, cross-file mechanical refactor spanning 5 files and ~50+ call sites, so treat as optional/deferred rather than blocking.
Also applies to: 67-71, 86-90, 105-109, 133-137, 176-180, 199-203, 240-244, 253-257, 266-270, 299-303, 318-322, 331-335, 344-348, 357-361, 370-374, 411-415, 428-468, 488-504, 518-522, 531-535, 562-566, 575-579, 600-604, 631-635, 656-660, 695-699, 708-712, 721-725, 779-783, 797-801, 834-838, 885-889, 919-923, 950-964, 997-1001, 1012-1016, 1035-1039, 1060-1088, 1150-1154, 1169-1173, 1201-1205, 1214-1218, 1227-1231, 1240-1244, 1258-1262
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/mocks/stores.go` at line 5, Repeated cast-and-panic boilerplate appears across the mock helpers in stores.go and the related test/mock files, so extract it into a small generic helper to make the pattern consistent and reduce duplication. Add a reusable helper (for example in the mocks package) that does the type assertion and formatted panic once, then replace the repeated args.Get(0).(T)/ok-check/panic blocks in stores.go, test_helpers.go, secretsmanager.go, ses.go, and sns.go with calls to that helper, preserving each file’s existing panic message style via the helper’s format string or wrapper.internal/api/handler.go (2)
426-436: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winNon-nil
errpath silently swallows the original error.The marshal-failure branch below logs via
logging.Errorfbefore returning 500, but thiserr != nilbranch does not log the incoming error at all — it's discarded entirely. Since every current production caller passesnil(making this branch test-only for now), the gap isn't user-visible yet, but if a future caller starts passing a real error here, the root cause will be lost with no observability trail.♻️ Proposed fix to log the swallowed error
func (h *Handler) buildResponse(statusCode int, headers map[string]string, body any, err error) *events.LambdaFunctionURLResponse { if err != nil { + logging.Errorf("buildResponse called with error: %v", err) return &events.LambdaFunctionURLResponse{ StatusCode: 500, Headers: headers, Body: `{"error": "internal server error"}`, } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/api/handler.go` around lines 426 - 436, The non-nil err path in Handler.buildResponse currently returns a 500 without recording the incoming error, so the original failure is lost. Update the buildResponse method to log the provided err before returning the internal server error response, using the same logging style already used in the marshal-failure branch, so future callers get an observability trail for this path.
371-380: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueDrop the always-nil error return from
HandleRequest
executeRequestfolds every route failure into a response, soHandleRequestnever returns a non-nil error. The checks ininternal/server/http.goandinternal/server/lambda.goare dead code; either simplify the signature in a follow-up or document the invariant.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/api/handler.go` around lines 371 - 380, `executeRequest` always converts routing failures into a response and returns a nil error, so the error path from `HandleRequest` is unreachable. Update the handler flow around `executeRequest` and `HandleRequest` to reflect that invariant by either removing the always-nil error from the return contract or clearly documenting that it never returns an error. Then simplify the callers in `internal/server/http.go` and `internal/server/lambda.go` so they no longer branch on a non-existent error.internal/api/handler_purchases.go (1)
241-247: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winCorrect ErrNotFound migration; consider extracting a shared helper.
All six call sites correctly replace the
execution == nilcheck witherrors.Is(err, config.ErrNotFound)before falling through to the genericerr != nilwrap, consistent with the new fail-loudGetExecutionByIDcontract. Logic is correct in each case.The same three-line pattern (
GetExecutionByID→errors.Is(..., ErrNotFound)→err != nilwrap) is now duplicated six times in this file. Consider extracting a small helper to reduce repetition and centralize the not-found message:♻️ Suggested helper
+func (h *Handler) getExecutionOr404(ctx context.Context, execID string) (*config.PurchaseExecution, error) { + execution, err := h.config.GetExecutionByID(ctx, execID) + if errors.Is(err, config.ErrNotFound) { + return nil, NewClientError(404, "execution not found") + } + if err != nil { + return nil, fmt.Errorf("failed to get execution: %w", err) + } + return execution, nil +}Callers with a bespoke 404 message (e.g.
cancelOrRecoverExecution's formatted message, orauthorizeExecutionManagement'serrNotFoundsentinel) can keep their own inline handling.Also applies to: 383-389, 433-440, 832-838, 1152-1158, 1457-1466
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/api/handler_purchases.go` around lines 241 - 247, The GetExecutionByID error handling pattern is duplicated across multiple call sites, so extract a small shared helper to centralize the common errors.Is(err, config.ErrNotFound) check and generic wrap. Update the repeated logic in the execution-loading paths in handler_purchases.go to call the helper, while preserving any bespoke 404 behavior in places like cancelOrRecoverExecution and authorizeExecutionManagement. Use the existing GetExecutionByID, config.ErrNotFound, and errNotFound symbols to keep the refactor aligned with the current contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@cmd/configure_azure.go`:
- Around line 411-422: The retry prompt in executeExplicitCommand is creating a
new bufio.NewReader on os.Stdin instead of reusing the caller’s reader, which
can drop already-buffered input. Update executeExplicitCommand to accept and use
the existing reader passed in from the surrounding flow, and thread that reader
through the prompt logic so readTrimmedLine consumes from the same stream
consistently.
In `@cmd/configure_gcp.go`:
- Around line 417-428: Thread the existing input reader through
executeGCPCommand instead of creating a new bufio.NewReader(os.Stdin) for the
y/N prompt. Update the caller and the executeGCPCommand flow so the same reader
used by readTrimmedLine is reused for the follow-up confirmation, preserving any
buffered piped input. Use the existing executeGCPCommand and readTrimmedLine
symbols to locate the prompt handling and replace the local reader creation with
the passed-in reader.
In `@internal/credentials/resolver.go`:
- Around line 408-417: The credential lookup in resolver.Load should not treat
CredentialStore.LoadRaw failures as “absent” WIF config. Update the code around
the raw/loadErr handling so that only a nil result means no stored
CredTypeGCPWIFConfig, and any non-nil loadErr is returned or otherwise surfaced
as a real failure instead of falling back to the federated path. Keep the
account.ID context in the error path and preserve the existing absence behavior
only when LoadRaw returns nil, nil.
---
Nitpick comments:
In `@internal/api/handler_purchases.go`:
- Around line 241-247: The GetExecutionByID error handling pattern is duplicated
across multiple call sites, so extract a small shared helper to centralize the
common errors.Is(err, config.ErrNotFound) check and generic wrap. Update the
repeated logic in the execution-loading paths in handler_purchases.go to call
the helper, while preserving any bespoke 404 behavior in places like
cancelOrRecoverExecution and authorizeExecutionManagement. Use the existing
GetExecutionByID, config.ErrNotFound, and errNotFound symbols to keep the
refactor aligned with the current contract.
In `@internal/api/handler.go`:
- Around line 426-436: The non-nil err path in Handler.buildResponse currently
returns a 500 without recording the incoming error, so the original failure is
lost. Update the buildResponse method to log the provided err before returning
the internal server error response, using the same logging style already used in
the marshal-failure branch, so future callers get an observability trail for
this path.
- Around line 371-380: `executeRequest` always converts routing failures into a
response and returns a nil error, so the error path from `HandleRequest` is
unreachable. Update the handler flow around `executeRequest` and `HandleRequest`
to reflect that invariant by either removing the always-nil error from the
return contract or clearly documenting that it never returns an error. Then
simplify the callers in `internal/server/http.go` and
`internal/server/lambda.go` so they no longer branch on a non-existent error.
In `@internal/mocks/stores.go`:
- Line 5: Repeated cast-and-panic boilerplate appears across the mock helpers in
stores.go and the related test/mock files, so extract it into a small generic
helper to make the pattern consistent and reduce duplication. Add a reusable
helper (for example in the mocks package) that does the type assertion and
formatted panic once, then replace the repeated args.Get(0).(T)/ok-check/panic
blocks in stores.go, test_helpers.go, secretsmanager.go, ses.go, and sns.go with
calls to that helper, preserving each file’s existing panic message style via
the helper’s format string or wrapper.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 78dbcb9b-5e86-4ff6-9d9b-210c14000a7e
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (44)
cmd/cleanup-lambda/main.gocmd/configure_azure.gocmd/configure_gcp.gocmd/rekey/main.gogo.modinternal/api/handler.gointernal/api/handler_accounts.gointernal/api/handler_coverage_test.gointernal/api/handler_dashboard.gointernal/api/handler_purchases.gointernal/api/handler_purchases_revoke.gointernal/api/handler_purchases_revoke_test.gointernal/api/handler_purchases_test.gointernal/api/handler_recommendations_refresh.gointernal/api/handler_registrations.gointernal/api/handler_test.gointernal/api/ri_utilization_cache.gointernal/api/scoping.gointernal/api/validation.gointernal/auth/service_apikeys.gointernal/auth/service_mfa.gointernal/auth/test_helpers.gointernal/commitmentopts/store_postgres.gointernal/config/interfaces.gointernal/config/store_postgres.gointernal/config/store_postgres_db_test.gointernal/config/store_postgres_pgxmock_test.gointernal/config/store_postgres_recommendations_test.gointernal/credentials/cipher.gointernal/credentials/resolver.gointernal/database/connection.gointernal/database/postgres/migrations/migrate.gointernal/database/postgres/testhelpers/postgres.gointernal/mocks/secretsmanager.gointernal/mocks/ses.gointernal/mocks/sns.gointernal/mocks/stores.gointernal/purchase/approvals.gointernal/purchase/approvals_test.gointernal/purchase/execution.gointernal/purchase/messages.gointernal/purchase/messages_test.gointernal/scheduler/scheduler.gointernal/server/http.go
…tial load errors CodeRabbit review follow-ups on PR #1343: - configure-azure/configure-gcp: executeExplicitCommand and executeGCPCommand created a fresh bufio.NewReader(os.Stdin) for the "Continue anyway?" retry prompt, which can drop input already buffered by the caller's reader and break piped input after earlier prompts. The caller's reader is now threaded through so all prompts consume one consistent buffered stream. - credentials resolver: a LoadRaw failure while peeking at the stored WIF config was logged and treated as absence, silently flipping a stored-JSON workload-identity-federation account onto the federated path on a store outage (auth behavior change on failure). LoadRaw returns (nil, nil) only for genuine absence, so store errors now propagate. New regression test asserts the error surfaces instead of falling through (verified failing pre-fix).
|
@coderabbitai review |
✅ Action performedReview finished.
|
Four compatibility fixes required after rebasing onto main which included PRs #1343 (errcheck), #1346 (ladder-sp-coverage), and #1340 (ladder-allocate): - cmd/cleanup-lambda: remove unused `errors` and `pgx` imports left over from conflict resolution (main's rollback pattern doesn't use them) - internal/mocks/stores.go: update ListStoredRecommendations signature to pointer (*RecommendationFilter) matching the interface change in #1343 - internal/scheduler/scheduler.go: pass ¶ms (pointer) to GetRecommendations after PR #1346 changed the function signature - providers/aws/service_client_test.go: add GetSavingsPlansCoverage and GetSavingsPlansUtilization stub methods to mockCostExplorerClient after PR #1346 extended the CostExplorerAPI interface
Closes #1339
"CI - Build & Test" has been red on
mainsince 2026-06-11. This PR resolves the three failure groups in four commits:Commits
fix(deps): bump github.com/jackc/pgx/v5 to v5.9.2 for GO-2026-5004- fixes the govulncheck finding (SQL injection via dollar-quoted string constants in the simple protocol).fix(lint): address all errcheck findings across cmd/ and internal/- fixes all 107 errcheck findings (check-blank+check-type-assertions) across 29 files: deferred rollbacks now log, interactive CLI reads propagate errors,buildResponsedrops its always-nil error return, best-effort operations log at warn/debug, 67 single-value type assertions in mocks/test helpers switched to the two-value form with panic-on-mismatch. errcheck is now at 0.fix(config): fail loud on missing execution and seed FK accounts in tests- fixes the three failing integration tests (root causes below).fix(api): filter ErrTxClosed rollback noise and harden CAS probe error handling- review follow-ups: swallowpgx.ErrTxClosedin thecommitmentopts.Savedeferred rollback, propagate hard DB errors from theTransitionExecutionStatusCAS probe instead of mapping them toErrNotFound(with a pgxmock regression test verified failing pre-fix), document theGetExecutionByIDcontract on the interface, and tolerateio.EOF-with-data in the configure-azure/gcp interactive reads so piped input keeps working.Root causes of the three failing integration tests
TestPostgresStore_PurchaseExecutions/Get_execution_by_ID_-_not_found: genuine silent-failure regression.PostgresStore.GetExecutionByIDreturned(nil, nil)for a missing row, so the not-found case produced no error and every caller had to nil-check (several forgot). It now returns an error wrappingconfig.ErrNotFound; a nil error guarantees a non-nil execution. All callers (purchase handlers, revoke, scoping, approvals/messages, store-internal CAS probes) switched toerrors.Is(err, config.ErrNotFound), and the pre-existing pgxmock unit testTestGetExecutionByID_NotFoundalready asserted this error contract.TestPostgresStore_UpsertRecommendations_AccountScopedEvictionand..._AmbientAndRegisteredCoexist: missing test pre-condition.recommendations.cloud_account_idcarries an FK tocloud_accounts(id)(migration 000030), but the tests inserted rows for account UUIDs never created incloud_accounts, failing with an FK violation. Production upserts only carry registered account IDs, so the tests now seed the accounts first via aseedRecommendationCloudAccounthelper.Notes
resolveTargetCoverageininternal/api/handler_dashboard.go:380has a display-side default fallback that is flagged as a future no-silent-fallback candidate; out of scope here.Verification
go build ./...exit 0;go vet ./...exit 0golangci-lint run --enable-only errcheck ./...-> 0 issues, exit 0go test ./...-> 5481 passed in 38 packages, exit 0go test -tags=integration ./internal/config/(dockerized Postgres testcontainers) -> 679 passed, exit 0; the three previously failing tests re-verified uncached against the committed codeSummary by CodeRabbit