Skip to content

feat(groups): add ServiceAccountAccessToken managed resource#326

Merged
markussiebert merged 4 commits into
masterfrom
feat/group-service-account-access-token
Jun 18, 2026
Merged

feat(groups): add ServiceAccountAccessToken managed resource#326
markussiebert merged 4 commits into
masterfrom
feat/group-service-account-access-token

Conversation

@markussiebert

@markussiebert markussiebert commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Description of your changes

Why: Running Crossplane against GitLab usually means handing the provider one broad, long-lived personal access token through a ProviderConfig — tied to a real user, shared across everything.

GitLab group service accounts change the economics: they don't consume a license seat* so you can create as many as you want. They're also a relatively new feature — introduced in GitLab 16.1 (API), rolled out to GitLab.com in 16.3, and generally available on GitLab Self-Managed from 17.6 — so this fills a gap the provider hasn't covered before. Pair them with Crossplane's namespaced ProviderConfigs — every namespace gets its own service account and its own scoped credential.

This PR makes that usable end to end:

  • Least privilege. A group service-account token scopes all automation to that service account's permissions.
  • Fully namespaced. The token lives in a namespaced Secret, is consumed by a namespaced ProviderConfig, and reconciles namespaced resources.
  • Self-sustaining. Because a service account can manage its own token, a short-lived token can keep itself alive by self-rotating. The whole loop can run on nothing but the service account's own credential — so a team can manage nearly everything inside their GitLab group from their own namespace, with a credential that rotates itself and never needs a human to refresh it.

To enable this, the PR adds a new namespaced groups.gitlab.m.crossplane.io/v1alpha1 ServiceAccountAccessToken managed resource — the token half that the existing ServiceAccount and group AccessToken resources don't cover. It has two modes, selected automatically.

Owner mode (default)

The ProviderConfig is a group owner (Group AccessToken is not enough - It must be an Instance ServiceAccount or User); the token is managed through the service-account endpoints of gitlab.com/gitlab-org/api/client-go:

Action Endpoint
Create Groups.CreateServiceAccountPersonalAccessToken
Observe Groups.ListServiceAccountPersonalAccessTokens (matched by token id — no get-by-id endpoint)
Rotate Groups.RotateServiceAccountPersonalAccessToken
Revoke (delete) Groups.RevokeServiceAccountPersonalAccessToken

expiresAt / renewalPeriodDays + renewBeforeDays rotation semantics match the group AccessToken controller. The external name is the token id, and the token value is written to the connection secret on create/rotate.

Self-managed mode (the self-rotating loop)

When the ProviderConfig authenticates with the very token this resource manages, the provider is acting as the service account and can only use the self endpoints. This is detected deterministically — no guessing: self-managed mode is entered when the ProviderConfig uses method: PersonalAccessToken and its credentials.secretRef (namespace, name, key) matches the resource's writeConnectionSecretToRef (i.e. the resource writes its token into the very secret the ProviderConfig reads).

In that mode:

Action Endpoint
Observe GET /personal_access_tokens/self (self-inform; external name auto-adopted from the response)
Rotate RotatePersonalAccessTokenSelf
Revoke (delete) RevokePersonalAccessTokenSelf

A dead/expired self-token surfaces as a clear terminal error (reseed the credentials secret) rather than thrashing doomed rotations.

Bootstrap note (reviewers): In self-managed mode the rotated token is written back into the secret the ProviderConfig reads. Crossplane only writes connection secrets it controls, so a hand-created bootstrap secret must use the connection type connection.crossplane.io/v1alpha1 (a default Opaque secret is rejected). Alternatively, bootstrap in owner mode first and switch providerConfigRef to the self ProviderConfig. This is documented in the README and examples/groups/serviceaccountaccesstoken.yaml.

Other details

  • The detected mode is reported via a SelfManaged status condition (+ a SELF print column).
  • The identity fields (groupId, serviceAccountId, name, scopes) are immutable, enforced with CEL XValidation; the rotation-timing fields stay mutable.
  • The cluster-scoped variant, CRDs and generated code are produced by make generate.

Follow-up: The same secret-reference detection could replace the group AccessToken controller's 401-based self-rotate fallback; I'll track that as a separate issue rather than widen this PR.

Fixes #324

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable test to ensure this PR is ready for review. (See testing notes — equivalent steps were run individually; local golangci-lint is 2.5.0 vs CI's 2.11.2.)

How has this code been tested

  • Unit tests for the client helpers (GenerateCreate/Rotate/Observation, ShouldRotate, paginated GetServiceAccountAccessToken lookup) and for the controller Observe/Create/Update/Delete in both modes:
    • owner mode — up-to-date, revoked, not-found, rotate, fresh-create;
    • self mode — self-inform up-to-date, rotation-due, auto-adopt of the external name, dead-token terminal error, self-rotate, self-revoke, and the SelfManaged condition.
  • make generate, go build ./..., go vet ./..., golangci-lint run ./... (clean; local 2.5.0), and the full go test ./... suite pass (the cluster-scoped variant is generated and tested too).
  • Live end-to-end on a KWOK cluster with the provider run out-of-cluster against a real GitLab instance: owner mode created a group, service account and token (Ready/Synced); self-managed mode then self-rotated the token before expiry, writing the new value back into the credentials secret and updating the external name — confirming the self-sustaining loop closes.

@markussiebert markussiebert force-pushed the feat/group-service-account-access-token branch 2 times, most recently from 1b52b0f to 929ff4b Compare June 16, 2026 18:29
@markussiebert markussiebert self-assigned this Jun 16, 2026
Adds a group-scoped ServiceAccountAccessToken managed resource that manages
the personal access token of a group service account.

Owner mode (default): the ProviderConfig is a group owner and the token is
managed via the service-account endpoints:

- Create  -> Groups.CreateServiceAccountPersonalAccessToken
- Observe -> Groups.ListServiceAccountPersonalAccessTokens (match by token id)
- Rotate  -> Groups.RotateServiceAccountPersonalAccessToken
- Revoke  -> Groups.RevokeServiceAccountPersonalAccessToken

Self-managed mode: when the referenced ProviderConfig authenticates with the
very token this resource writes to its connection secret (detected when the
PersonalAccessToken credential secretRef matches writeConnectionSecretToRef by
namespace, name and key), the provider acts as the service account itself and
uses the self endpoints instead:

- Observe -> GET /personal_access_tokens/self (self-inform; external name is
  auto-adopted from the response)
- Rotate  -> RotatePersonalAccessTokenSelf
- Revoke  -> RevokePersonalAccessTokenSelf

This enables a self-sustaining loop of short-lived, self-rotating tokens used to
reconcile a group. A dead self-token surfaces as a clear terminal error
(reseed the credentials secret). A SelfManaged status condition reports the
detected mode.

The external name is the token id and the token value is written to the
connection secret on create/rotate. Rotation, expiresAt/renewalPeriodDays and
renewBeforeDays semantics match the group AccessToken controller. groupId,
serviceAccountId, name and scopes are immutable (enforced via CEL); the
rotation-timing fields stay mutable.

Fixes #324

Signed-off-by: Markus Siebert <markus.siebert@deutschebahn.com>
@derbauer97

derbauer97 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Thanks, this looks solid overall. I found three issues worth addressing before merge, and I also verified the behavior end to end in a local kind cluster against a disposable group on our own GitLab instance.

  1. In self-managed mode the controller adopts whatever PAT is in the ProviderConfig secret, but it does not validate that it matches spec.forProvider.serviceAccountId.

    • pkg/namespaced/controller/groups/serviceaccountaccesstokens/controller.go:201-228
    • pkg/cluster/controller/groups/serviceaccountaccesstokens/zz_controller.go:203-230

    This is not a privilege-escalation issue, because the wrong PAT must already be present in the secret, but it is still a correctness/safety problem: a miswired secret can make the resource rotate or revoke the wrong service account token while reconcile still succeeds.

  2. In owner mode the rotate -> create fallback is too broad.

    • pkg/namespaced/controller/groups/serviceaccountaccesstokens/controller.go:334-350
    • pkg/cluster/controller/groups/serviceaccountaccesstokens/zz_controller.go:336-351

    Right now any rotate error falls through to CreateServiceAccountPersonalAccessToken. That can mint a second PAT on transient failures or generic request errors.

    We tested the actual endpoint behavior against our own GitLab instance:

    • valid rotate -> 200
    • nonexistent token -> 404
    • wrong service_account_id path -> 404
    • revoked token -> 400 Bad request - Token already revoked

    So fallback-to-create seems reasonable for 404 and likely for the specific revoked-token 400, but not for generic 400, 403, 5xx, or transport errors.

  3. Cluster-scoped self-managed mode looks broken in practice.

    • pkg/cluster/controller/groups/serviceaccountaccesstokens/zz_controller.go:161-175

    isSelfManaged() compares ref.Namespace == cr.GetNamespace(), but a cluster-scoped resource has no namespace. For cluster-scoped resources writeConnectionSecretToRef is a full SecretReference, so this likely needs to compare ref.Namespace == w.Namespace instead.

    • apis/cluster/groups/v1alpha1/zz_generated.managed.go:493-495

    I also verified this end to end:

    • namespaced owner-mode works
    • namespaced self-managed mode works
    • self-managed auto-adoption works without manually setting the external name
    • switching the same namespaced resource from owner-mode to a self-managed ProviderConfig works
    • forcing rotation in self-managed mode rotated the PAT and updated the external name as expected
    • cluster-scoped owner-mode works
    • but cluster-scoped self-managed mode does not activate even when the ProviderConfig reads from the exact same secret that the resource writes to; it stays SelfManaged=False, Reason=OwnerManaged
  4. GetServiceAccountAccessToken paginates without filtering by state.

    • pkg/namespaced/clients/groups/serviceaccountaccesstoken.go:82-84
    • pkg/cluster/clients/groups/zz_serviceaccountaccesstoken.go:84-86

    The list endpoint supports state=active (docs), but the code does not set it. This means every Observe poll paginates through all tokens (including revoked/expired) to find the one matching the external name. For service accounts with many tokens this adds unnecessary latency and API rate-limit pressure.

    Adding State: gitlab.Ptr("active") to the list options would be a safe improvement: a revoked/missing token already surfaces as "not found" (nil result), which the controller already handles as ResourceExists: false.

Optional usability notes:

  • if we want this resource to be friendlier to use from kubectl, there currently is no CRD shortNames entry for ServiceAccountAccessToken. Adding a short name such as saat could make day-to-day usage a bit nicer, but this is non-blocking and purely a UX improvement.

Happy path and tests look good otherwise.

- self mode: validate the adopted PAT belongs to spec.forProvider.serviceAccountId
  (PersonalAccessToken.UserID) and surface a terminal error on mismatch, so a
  miswired credentials secret can no longer rotate/revoke the wrong service
  account's token while the reconcile still succeeds.
- owner mode: narrow the rotate->create fallback. Only fall through to a fresh
  create when the existing token is genuinely gone (404, or the specific
  'Token already revoked' 400). Any other error (generic 400, 403, 5xx,
  transport) now returns instead of minting a second token.
- cluster scope: fix self-mode detection for cluster-scoped resources. A
  cluster resource has no namespace and writes to a full SecretReference, so
  isSelfManaged must compare the credential secret namespace against the write
  reference's namespace. Implemented as a generator replacement so the cluster
  zz_ code is regenerated correctly.
- client: filter the service-account token list with state=active to avoid
  paginating through revoked/expired tokens on every Observe poll.
- crd: add the 'saat' shortName for friendlier kubectl usage.

Regenerated cluster-scoped code and CRDs via make generate.

Signed-off-by: Markus Siebert <markus.siebert@deutschebahn.com>
@markussiebert

Copy link
Copy Markdown
Collaborator Author

Thanks for the thorough review @derbauer97 — and especially for verifying it end to end. All four points addressed in e5f52a3:

  1. Self-mode service-account validation. After self-inform we now check PersonalAccessToken.UserID against spec.forProvider.serviceAccountId and return a terminal error on mismatch, so a miswired credentials secret can no longer rotate/revoke the wrong service account's token while the reconcile still reports success. (Guarded on serviceAccountId being set, so it's a no-op when unset.)

  2. Narrowed owner-mode rotate→create fallback. The rotate response is now inspected and we only fall through to a fresh create when the token is genuinely gone — a 404, or the specific 400 "Token already revoked". Any other failure (generic 400, 403, 5xx, transport error) now returns instead of minting a second token. Matches the endpoint behavior you measured.

  3. Cluster-scoped self-mode detection. Fixed. A cluster-scoped resource has no namespace and writes to a full SecretReference, so isSelfManaged must compare the credential secret namespace against the write reference's namespace (ref.Namespace == w.Namespace) rather than cr.GetNamespace(). Since the cluster controller is generated, this is implemented as a replacement rule in hack/generate-cluster-scope.go; the namespaced path keeps the CR-namespace comparison (correct there, since it writes a LocalSecretReference). Regenerated zz_controller.go confirms the cluster variant now compares against w.Namespace.

  4. state=active list filter. GetServiceAccountAccessToken now sets State: gitlab.Ptr("active"), so Observe no longer paginates through revoked/expired tokens. A revoked/missing token still surfaces as not-found (nil result) → ResourceExists: false, which the controller already handles.

Plus the optional usability note: added the saat shortName to the CRD.

Tests: added self-mode match/mismatch cases, four owner-mode rotate-fallback cases (404 + revoked-400 fall through; generic-400 + transient do not, asserting create is never called), and a client test asserting state=active. go build, go vet, and the full go test ./... suite pass (cluster-scoped variants regenerated and covered).

@derbauer97

Copy link
Copy Markdown
Contributor

I reran the provider locally in kind against the same disposable group after e5f52a3.

What now works from my retest:

  • namespaced owner-mode still works
  • namespaced self-managed mode still works
  • revoked-token recovery in owner mode now works as intended: after revoking the PAT out of band, the provider created a fresh token on the next reconcile and updated the external name (114062 -> 114064 in my run)
  • the narrowed rotate->create behavior is visible in the code and the revoked-token path is working in practice

What I still see not working:

  • cluster-scoped self-managed mode still does not activate in e2e

I reproed this again by:

  • creating a cluster-scoped ServiceAccountAccessToken in owner mode
  • letting it publish its token secret
  • creating a cluster-scoped ProviderConfig that reads from that exact same secret
  • switching the existing token resource to that ProviderConfig

Expected:

  • SelfManaged=True

Actual:

  • SelfManaged=False
  • providerConfigRef.name=gitlab-cluster-self
  • controller stays on the owner-mode path and then repeatedly fails with 404 Not Found

I checked the code again and the remaining root cause seems to be in the legacy cluster ProviderConfig path, not in isSelfManaged() itself anymore:

  • pkg/cluster/controller/groups/serviceaccountaccesstokens/zz_controller.go:165-179

    • cluster isSelfManaged() now compares ref.Namespace == w.Namespace, which looks correct
  • but pkg/common/gitlab.go:130-157 (UseLegacyProviderConfig) still returns common.Config without populating CredentialsSecretRef

That means for cluster-scoped resources:

  • cfg.CredentialsSecretRef == nil
  • isSelfManaged() returns false
  • self-managed mode never activates even with correct secret wiring

The modern namespaced path does populate it:

  • pkg/common/gitlab.go:199-205

So from my retest, items 1/2/4 look good, but cluster self-managed mode still appears broken until the legacy config builder also carries CredentialsSecretRef through.

…nfig path

Cluster-scoped resources are LegacyManaged and resolve their credentials via
UseLegacyProviderConfig, which built a Config without CredentialsSecretRef. As
a result isSelfManaged() always saw a nil ref and self-managed mode never
activated for cluster-scoped ServiceAccountAccessTokens, even when the
ProviderConfig read from the exact secret the resource writes to (it stayed
SelfManaged=False / OwnerManaged and then 404'd on the owner path).

Populate CredentialsSecretRef from pc.Spec.Credentials.SecretRef in the legacy
path, mirroring the modern namespaced path, so self-mode detection works for
cluster-scoped resources too.

Co-authored fix and test verifying the legacy builder carries the ref.

Signed-off-by: Markus Siebert <markus.siebert@deutschebahn.com>
@markussiebert

Copy link
Copy Markdown
Collaborator Author

Good catch @derbauer97 — that's exactly right. The isSelfManaged() namespace comparison was necessary but not sufficient: cluster-scoped resources are LegacyManaged and resolve credentials through UseLegacyProviderConfig, which built the Config without CredentialsSecretRef. So ref == nil and self-mode could never activate for cluster scope, regardless of how the secret was wired.

Fixed in 7be643b: UseLegacyProviderConfig now carries CredentialsSecretRef: pc.Spec.Credentials.SecretRef through, mirroring the modern namespaced path. The legacy ProviderConfig uses the same v2.CommonCredentialSelectors, so the field types line up directly.

Added a unit test (TestUseLegacyProviderConfigSetsCredentialsSecretRef) asserting the legacy builder populates the ref. go build, go vet, and the full go test ./... suite pass.

With this plus the earlier isSelfManaged namespace fix, the cluster-scoped self-managed path should now activate (SelfManaged=True) when the cluster ProviderConfig reads from the same secret the resource writes to. Would appreciate a re-run of your cluster-scoped self-mode repro to confirm on your end.

Move the provider-gitlab-local imports into their own trailing gci group to
satisfy the project's gci section order (standard, default, prefix).

Signed-off-by: Markus Siebert <markus.siebert@deutschebahn.com>
@derbauer97

Copy link
Copy Markdown
Contributor

LGTM @henrysachs or @dariozachow needs to approve :)

@markussiebert markussiebert merged commit 023b5d8 into master Jun 18, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a group ServiceAccount personal access token managed resource

3 participants