Skip to content

cmd/cue: significant memory regression in v0.16.0 compared to v0.14.x #4298

@laughingman7743

Description

@laughingman7743

Summary

We observe a significant memory usage increase (up to +57%) when upgrading from CUE v0.14.2 to v0.16.0 for cue cmd operations on a large Kubernetes manifest codebase. This causes OOM kills on our CI environment (16 GiB RAM limit).

Environment

  • Large CUE codebase generating Kubernetes manifests
  • Shared packages (pkg/): ~95 CUE files, ~320 #Definition declarations (Kubernetes resource types, workflow definitions, etc.)
  • Heaviest service: ~720 CUE files total, split across 15 delivery directories
  • Each cue cmd invocation loads:
    • 90-155 CUE files in the target directory (each file typically defines one Kubernetes resource referencing shared #Definitions)
    • ~5 parent directories with additional CUE files (metadata, common config)
    • All shared package definitions via imports
  • The definitions heavily use closed structs (#Def) for Kubernetes resource types with deep nesting (CRDs like Argo Workflows, Flink operators, etc.)
  • Tested on macOS (Apple Silicon, arm64)

Reproduction

We run cue cmd dump which evaluates CUE and marshals the result to YAML via yaml.MarshalStream. The command definition:

import (
    "encoding/yaml"
    "tool/cli"
    "example.com/pkg/k8s"
)

command: dump: cli.Print & {
    text: yaml.MarshalStream([for o in k8s.#InstallOrder for r in Delivery.resources if r.kind == o {r}])
}

We have 15 such directories (delivery units) in the service, each evaluated independently and sequentially.

Benchmark Results

Measured peak RSS (maximum resident set size) via /usr/bin/time -l on macOS:

Directory CUE files v0.14.2 v0.15.4 v0.16.0 v0.16.0 vs v0.14.2
Dir A 93 6.9 GiB / 25.7s 9.2 GiB / 32.2s 10.8 GiB / 16.2s +57% mem, -37% time
Dir B 155 10.1 GiB / 44.8s 9.4 GiB / 52.4s 11.9 GiB / 31.8s +18% mem, -29% time
Dir C 153 10.2 GiB / 43.2s 9.0 GiB / 49.7s 10.8 GiB / 33.9s +6% mem, -22% time

Additional data points (v0.14.2 vs v0.16.0)

Directory CUE files v0.14.2 v0.16.0 Memory change
Dir D ~50 4.0 GiB / 12.9s 6.1 GiB / 8.1s +53%
Dir E ~50 3.7 GiB / 11.8s 5.8 GiB / 7.5s +57%
Dir F ~40 3.1 GiB / 9.4s 4.7 GiB / 6.0s +52%
Dir G ~60 3.7 GiB / 12.3s 6.0 GiB / 7.9s +62%
Dir H ~30 2.8 GiB / 12.0s 5.5 GiB / 7.5s +96%

Observations

  • v0.16.0 is consistently faster (22-37% speed improvement) but uses significantly more memory (6-96% increase)
  • This appears to be a speed vs memory tradeoff in the evaluator, possibly related to caching changes mentioned in the v0.16.0 release notes ("typechecker caching")
  • The release notes state "memory usage dropped by up to 60% in some projects" — our workload shows the opposite trend
  • v0.15.4 shows mixed memory results (sometimes better, sometimes worse than v0.14.2) and is consistently the slowest

Impact

  • Our CI environment has a 16 GiB memory limit and cannot accommodate v0.16.0's memory usage for the heaviest directories
  • The process is OOM-killed during cue cmd dump
  • v0.14.2 was already tight (10.1-10.2 GiB peak) but fit within the 16 GiB limit
  • We are unable to upgrade to v0.16.0 due to this regression

Related Issues

CUE Versions Tested

  • v0.14.2 (latest v0.14.x)
  • v0.15.4 (latest v0.15.x)
  • v0.16.0 (latest)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions