-
Notifications
You must be signed in to change notification settings - Fork 351
Open
Labels
Description
Summary
We observe a significant memory usage increase (up to +57%) when upgrading from CUE v0.14.2 to v0.16.0 for cue cmd operations on a large Kubernetes manifest codebase. This causes OOM kills on our CI environment (16 GiB RAM limit).
Environment
- Large CUE codebase generating Kubernetes manifests
- Shared packages (
pkg/): ~95 CUE files, ~320#Definitiondeclarations (Kubernetes resource types, workflow definitions, etc.) - Heaviest service: ~720 CUE files total, split across 15 delivery directories
- Each
cue cmdinvocation loads:- 90-155 CUE files in the target directory (each file typically defines one Kubernetes resource referencing shared
#Definitions) - ~5 parent directories with additional CUE files (metadata, common config)
- All shared package definitions via imports
- 90-155 CUE files in the target directory (each file typically defines one Kubernetes resource referencing shared
- The definitions heavily use closed structs (
#Def) for Kubernetes resource types with deep nesting (CRDs like Argo Workflows, Flink operators, etc.) - Tested on macOS (Apple Silicon, arm64)
Reproduction
We run cue cmd dump which evaluates CUE and marshals the result to YAML via yaml.MarshalStream. The command definition:
import (
"encoding/yaml"
"tool/cli"
"example.com/pkg/k8s"
)
command: dump: cli.Print & {
text: yaml.MarshalStream([for o in k8s.#InstallOrder for r in Delivery.resources if r.kind == o {r}])
}We have 15 such directories (delivery units) in the service, each evaluated independently and sequentially.
Benchmark Results
Measured peak RSS (maximum resident set size) via /usr/bin/time -l on macOS:
| Directory | CUE files | v0.14.2 | v0.15.4 | v0.16.0 | v0.16.0 vs v0.14.2 |
|---|---|---|---|---|---|
| Dir A | 93 | 6.9 GiB / 25.7s | 9.2 GiB / 32.2s | 10.8 GiB / 16.2s | +57% mem, -37% time |
| Dir B | 155 | 10.1 GiB / 44.8s | 9.4 GiB / 52.4s | 11.9 GiB / 31.8s | +18% mem, -29% time |
| Dir C | 153 | 10.2 GiB / 43.2s | 9.0 GiB / 49.7s | 10.8 GiB / 33.9s | +6% mem, -22% time |
Additional data points (v0.14.2 vs v0.16.0)
| Directory | CUE files | v0.14.2 | v0.16.0 | Memory change |
|---|---|---|---|---|
| Dir D | ~50 | 4.0 GiB / 12.9s | 6.1 GiB / 8.1s | +53% |
| Dir E | ~50 | 3.7 GiB / 11.8s | 5.8 GiB / 7.5s | +57% |
| Dir F | ~40 | 3.1 GiB / 9.4s | 4.7 GiB / 6.0s | +52% |
| Dir G | ~60 | 3.7 GiB / 12.3s | 6.0 GiB / 7.9s | +62% |
| Dir H | ~30 | 2.8 GiB / 12.0s | 5.5 GiB / 7.5s | +96% |
Observations
- v0.16.0 is consistently faster (22-37% speed improvement) but uses significantly more memory (6-96% increase)
- This appears to be a speed vs memory tradeoff in the evaluator, possibly related to caching changes mentioned in the v0.16.0 release notes ("typechecker caching")
- The release notes state "memory usage dropped by up to 60% in some projects" — our workload shows the opposite trend
- v0.15.4 shows mixed memory results (sometimes better, sometimes worse than v0.14.2) and is consistently the slowest
Impact
- Our CI environment has a 16 GiB memory limit and cannot accommodate v0.16.0's memory usage for the heaviest directories
- The process is OOM-killed during
cue cmd dump - v0.14.2 was already tight (10.1-10.2 GiB peak) but fit within the 16 GiB limit
- We are unable to upgrade to v0.16.0 due to this regression
Related Issues
- Performance: closedness algorithm consuming lots of memory #2853 — closedness algorithm consuming lots of memory
- Performance #2850 — Performance (umbrella issue)
CUE Versions Tested
- v0.14.2 (latest v0.14.x)
- v0.15.4 (latest v0.15.x)
- v0.16.0 (latest)
Reactions are currently unavailable