Skip to content

Redesign SDK for v2 platform API: services, typed errors, iter, streaming#19

Open
chrisghill wants to merge 12 commits into
mainfrom
sdk-redesign-for-v2
Open

Redesign SDK for v2 platform API: services, typed errors, iter, streaming#19
chrisghill wants to merge 12 commits into
mainfrom
sdk-redesign-for-v2

Conversation

@chrisghill
Copy link
Copy Markdown
Member

Summary

Full v2 redesign of the Go SDK. Replaces the WIP stub on main with a receiver-based, idiomatic Go API that tracks the platform's GraphQL v2 surface. Highlights:

  • Receiver-based services. Top-level *massdriver.Client aggregates a *Service per domain (c.Projects, c.Deployments, c.Resources, c.Groups, etc.). Functional options for construction (WithAPIKey, WithOrganizationID, WithBaseURL, WithProfile, WithGQLClient, WithTimeout).
  • Typed errors. Sentinels in gql (ErrNotFound, ErrUnauthenticated, ErrForbidden) classifiable with errors.Is. Mutation validation surfaced as *gql.MutationFailed with per-field messages.
  • Lazy pagination. Iter methods return iter.Seq2[T, error] (Go 1.23 range-over-func) for unbounded queries; List auto-follows cursors and returns a slice for bounded ones.
  • Live log streaming. Deployments.StreamLogs / TailLogs over an Absinthe websocket subscription.
  • Two API surfaces, separately documented. massdriver/platform is the GraphQL public SDK (PAT / service-account auth). massdriver/provisioning is the deployment-token-only REST surface used from inside provisioner runs (xo CLI, Terraform provider). READMEs in both dirs spell this out.
  • Live-API integration suite. Build-tag gated *_integration_test.go per package, plus a cross-package scenario test under massdriver/test. Run on demand via make test-integration against a real sandbox.

Tooling

  • golangci-lint v2 config, with per-path exclusion rules tuned to the codebase (0 issues at the time of merge).
  • GHA workflow runs lint on PR + main pushes.
  • Makefile with help, build, vet, lint, test, test-integration, check, schema, generate targets.
  • make schema curls the public schema from api.massdriver.cloud/graphql/v2/schema.graphql; make generate runs genqlient.

Notes for reviewers

  • 141 files / +54k −0.9k. The bulk is generated genqlient code (massdriver/internal/gen/zz_generated.go) and the schema; SDK-authored code is a fraction of the total. Filter :!massdriver/internal/gen/* to focus the diff.
  • Integration tests are serialized (-p 1) because they share a single sandbox org and the platform's custom-attribute schema validation makes parallel package runs flake. Documented in the Makefile.
  • massdriver/provisioning/ was renamed from massdriver/services/ to free the Service name for the per-package service receivers.

@chrisghill chrisghill requested a review from coryodaniel May 10, 2026 19:27
@chrisghill
Copy link
Copy Markdown
Member Author

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 10, 2026

Claude finished @chrisghill's task in 2m 52s —— View job


Code Review: Redesign SDK for v2 platform API

  • Read CLAUDE.md and repo setup
  • Review core client architecture (massdriver/client.go, massdriver/options.go)
  • Review typed errors (massdriver/gql/errors.go)
  • Review GQL client (massdriver/gql/client.go)
  • Review pagination / iter implementation
  • Review websocket streaming (massdriver/internal/absinthe/, deployments/stream.go)
  • Review service implementations (sample)
  • Review test coverage and patterns
  • Review config/credentials handling
  • Compile findings and post review

This is a well-designed, idiomatic Go SDK with strong documentation, sensible abstractions, and thorough test coverage. The review below separates bugs from suggestions.


Bugs / Correctness Issues

1. RoundTrip violates the http.RoundTripper contract [massdriver/gql/client.go:26]

The http.RoundTripper documentation states: "RoundTrip must not modify the request". The current implementation mutates req.Header in place, which can produce incorrect behavior if the same *http.Request is ever retried by a wrapper (e.g., some http.Transport retry paths, or test code that replays a request).

// current — modifies caller's request
func (r *roundTripperWithHeaders) RoundTrip(req *http.Request) (*http.Response, error) {
    for k, v := range r.Headers {
        req.Header.Set(k, v)  // ← mutates the original
    }
    return r.Base.RoundTrip(req)
}

// correct — clone first
func (r *roundTripperWithHeaders) RoundTrip(req *http.Request) (*http.Response, error) {
    req = req.Clone(req.Context())
    for k, v := range r.Headers {
        req.Header.Set(k, v)
    }
    return r.Base.RoundTrip(req)
}

Fix this →


2. time.After in push leaks timers [massdriver/internal/absinthe/socket.go:164]

time.After(replyTimeout) allocates a timer that is not garbage-collected until it fires — even if push returns early because the reply arrived or ctx was cancelled. Under any concurrency load (many concurrent subscriptions/heartbeats), this is a steady timer leak for the duration of replyTimeout per call.

// current
case <-time.After(replyTimeout):

// correct
timer := time.NewTimer(replyTimeout)
defer timer.Stop()
// ...
case <-timer.C:

Fix this →


3. Wrapping a potentially-nil error produces "...: <nil>" [massdriver/internal/absinthe/socket.go:163]

When s.done closes due to a clean server shutdown, s.closeErr is nil (the isNormalClose guard prevents it from being set). The <-s.done select arm then does:

return reply{}, fmt.Errorf("absinthe socket closed: %w", s.Err())

fmt.Errorf with %w and a nil error produces the string "absinthe socket closed: <nil>", which is confusing. Check before wrapping:

case <-s.done:
    if err := s.Err(); err != nil {
        return reply{}, fmt.Errorf("absinthe socket closed: %w", err)
    }
    return reply{}, errors.New("absinthe socket closed")

Security

4. Bearer token in WebSocket query string [massdriver/internal/absinthe/socket.go:337]

The comment at buildWSURL says the token is sent as a query parameter "because browsers can't set headers on WebSocket upgrades." This SDK is server-side Go — gorilla/websocket.DialContext accepts a http.Header argument. Query-string tokens appear in server access logs, proxy logs, and load-balancer logs, which widens the exposure surface of the PAT.

// gorilla supports headers on dial:
conn, _, err := websocket.DefaultDialer.DialContext(ctx, wsURL, http.Header{
    "Authorization": {"Bearer " + token},
})

Consider moving the token to an Authorization header for the WebSocket handshake and dropping it from the URL. The comment should also be updated if the browser-compatibility rationale no longer applies.


Minor Issues

5. Dead code in source-inference fallback [massdriver/config/credentials.go:96-100]

Inside the organizationID != "" && apiKey != "" branch, the origin == SourceUnknown path infers the source by checking which layer is non-empty. The final default: source = SourceEnv can never fire: we already established apiKey != "" via cmp.Or(envs.APIKey, profile.APIKey), so either envs.APIKey != "" or profile.APIKey != "" must hold. The dead branch is noise and mildly misleading.


6. UUID org-ID check is a hard error, not a deprecation [massdriver/config/config.go:193-195]

uuidErr := uuid.Validate(cfg.OrganizationID)
if uuidErr == nil {
    return fmt.Errorf("organization ID is a UUID. This is deprecated...")
}

Any user who has a UUID org ID currently stored in their profile YAML gets an immediate hard failure on upgrade. Depending on how many users are affected, this could be a significant compatibility break. A logged warning that still allows the call to proceed (and a deprecation window) would be less disruptive.


7. Duplicated userAgent() [massdriver/gql/client.go:51-68]

The comment explicitly acknowledges the duplication exists to break an import cycle. A clean fix is to move the version-resolution logic to a massdriver/internal/version package (which imports neither gql nor client) and let both importers call into it. Not blocking, but the current state accumulates future maintenance debt if the logic ever needs to change.


Design Observations (all positive)

  • Functional options with timeoutSet flag: The bool sentinel correctly distinguishes WithTimeout(0) (disable) from "no option given" (use default) — a common Go pitfall handled correctly.
  • errors.Join for sentinel layering: ClassifyError uses errors.Join(err, sentinel) to attach the classifiable sentinel without losing the original transport error. errors.As on the result still exposes *graphql.HTTPError. Correct and ergonomic.
  • CheckMutation generic: The two-type-parameter trick (M any, PM interface{ *M; validationMessage }) avoids the pointer-receiver problem elegantly without reflection. This is idiomatic Go 1.21+.
  • iter.Seq2 pagination: Clean adoption of the Go 1.23 range-over-func pattern. The yield-returns-false early-stop path is correctly handled at if !yield(*d, nil) { return }.
  • gqltest package: The queued-response / recorded-request mock is simple and powerful. Pending() in t.Cleanup is a clean way to assert all mocked responses were consumed.
  • timeWrapHook in decode: The mapstructure intermediate-map representation strips unexported fields on time.Time, which would silently zero all timestamps. The wrap/unwrap approach is the correct fix.
  • Cursor type with omitempty: Hand-rolled to avoid sending limit: 0, preventing server-side validation errors. The comment explaining why is exactly the right thing to document.
  • Two public surfaces (platform / provisioning): Clean separation with separate READMEs. Makes auth requirements explicit at the package boundary rather than buried in runtime errors.

Summary

Three items warrant attention before merge: the RoundTripper mutation bug (correctness), the time.After timer leak (resource), and the WebSocket token exposure (security). The rest are minor and several have a simple one-liner fix. The overall quality is high — the architecture is sound, the test coverage is solid, and the documentation is thorough.
|

@chrisghill chrisghill requested a review from ChristensenJoe May 12, 2026 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants