Notes on `cue cmd` #3917

infogulch · 2025-05-10T06:09:22Z

infogulch
May 10, 2025

I've spent over a decade implementing existing and custom build and deployment tools. Most tools have their niche, all have flaws. When I came across CUE a couple years ago it stood out as potentially sidestepping the biggest flaws in the systems I've used. I'm comfortable with most programming paradigms, but I have a soft spot for prolog and unification more broadly, so perhaps my eye for CUE is not surprising. Last year I chose to learn CUE by experimenting with cue cmd as a substitute Makefile and CI for a little http/html rendering engine based on Go templates I was building. (I have been advised to avoid trying 3+ experiments in one project; alas, that advice has never stuck.) The result is make_tool.cue which acts as both a local makefile/task runner and CI to run builds -- 6 Go platform targets, as a module in a custom Caddy build, and docker image -- including scripted integration tests to validate most of these artifacts after the release builds are complete.

This post is a synthesis of my thoughts on how to improve cue cmd for this and similar purposes, based on the experiment and my past experience.

With some work cue cmd has the potential to be a world class task execution engine. #1325 contains a loose collection of possible improvements to cue cmd, which is good to have, but it doesn't try to present a coherent vision of where the tool should go.

I think cue cmd can successfully target Makefiles, task files, bash run scripts, proprietary CI scripts like GitHub Actions workflows, and maybe eventually Dockerfiles, bazel, and nix flakes.

This list contains, in my opinion, the most important things to improve in cue cmd to make it more a more useful tool for these use cases, numbered for easy reference in the discussion below. Sections are organized in a rough priority order.

Let me know what you think!

Debuggability

cue cmd incorporates layers of complex interacting systems. Users need to be able to debug each of these layers.

CUE Definition Layer. For practical purposes CUE is a full programming language (Turing completeness is not relevant here), so users need to be able to debug it like it's a programming language. In addition, cue cmd may be a user's first foray into CUE in general, which makes getting feedback in this layer important. That means at least the ability to export expanded cue after parsing but before flow execution begins to debug their initial definitions. Validating that tasks match their tool schema definition would also be good. Currently, CUE export refuses to even recognize tool files and cannot export them, and the workarounds are tedious.

Allow exporting cue tool files
Validation that tasks match the corresponding tool schema

CUE Flow Layer. The mermaid diagram export is on the right path and it's nice to have a straightforward general solution, but it's difficult to use in practice and is hard to read because there is no well-defined way to lay out the tasks so they end up jumbled when rendered. The diagram would be easier to read if the mermaid output used a diagram type that could represent time; a sequence diagram perhaps. Copying 100's of lines out of a terminal to paste into a mermaid renderer every time a task updates just to see anything at all is tedious. It would be nice if the terminal displayed the the task execution graph in a way that indicated the live status without requiring external tools. Docker's interactive build UI manages to display concurrent command execution in a coherent way, perhaps this could be an inspiration.

Change mermaid diagram type to represent start and end time of tasks in addition to their dependencies
TUI to depict live task execution graph status and logs

Logging

While concurrent task execution is an excellent feature, its utility is ruined by interleaved stderr/stdout logs. One solution that I've mentioned before is prefixing log outputs with a unique task id. Cue itself logging task start and end times could be useful for debugging, as well as being able to tee output to a file. Besides prefixes, another option is to buffer task output and send it all out in a group. This is an option offered by go-task which makes for a very nice integration with GitHub Actions' ::group:: command.

Prefix log outputs with task id
Prefix log outputs with a timestamp
Log task start and task end
Tee logs to both stdout and a file
Buffer then output all logs for a task at once with begin and end markers

Metadata

It's important for users to be able to discover available commands without diving into tool source files, especially when cue cmd is used in larger projects with many tasks.

List available commands e.g. cue cmd --list
Include tag attributes in listings as command options
Provide a mechanism to hide 'internal use' commands to reduce clutter in listings, perhaps via attribute in the task definition
Support adding descriptions to commands, which are displayed in the command listing

Documentation

Browsing the CUE repository or the godoc site for .cue and .go files to discover available tasks and their behavior is inefficient and frustrating. Comprehensive, accessible documentation is essential for adoption and effective use.

Integrate documentation for each tool module and task into the official CUE documentation website, with examples, schema definitions, and common use cases.

Control Flow

The current control flow mechanisms ($after) in cue cmd are too simple. Mutexes/semaphores as suggested in #1325 could do "anything" but I suspect they will be too low-level for common use cases. A few higher-level abstractions would improve usability and readability.

Support implicit sequencing by allowing tasks to be defined in a list, where order implies sequential dependencies.
Introduce a defer mechanism to ensure specific tasks (e.g., cleanup tasks) execute regardless of whether dependencies succeed or fail.
Allow individual task dependencies to be specified clearly, either via a $done marker or task embedding; if the latter, something will need to be done to align with debuggability goal 1 above.
Provide high-level primitives for common patterns, such as retries or timeouts, to reduce the need for users to implement these manually.

Flow expression propagation limitations

Task outputs are critical for dynamic workflows, but non-task expressions derived from task outputs are not handled correctly and fail evaluation too early in value flow propagation.

Improve support for propagating non-task expressions derived from task outputs, ensuring they can be used in subsequent tasks or conditions without requiring awkward workarounds.

Task Deduplication

Avoiding redundant task execution improves performance, especially in large workflows. In addition deduplication completely eliminates any need for the insane overcomplicated gymnastics I performed to simulate shared global variables in make_tool.cue.

Implement task fingerprinting to detect when inputs and dependencies haven’t changed, skipping execution and reusing outputs of a previously executed task.
Cache and reuse evaluations across executions, with options to invalidate the cache explicitly or based on time/dependency changes.

Cross platform shell

Shell command behavior varies across systems, even within the same OS, due to differences in shell selection, version, and configuration. A consistent shell experience is crucial for portability.

Adopt a built-in shell interpreter library, such as mvdan/sh, to standardize shell command behavior across platforms, ensuring consistent syntax and execution regardless of the host environment.
Provide configuration options to override the default shell interpreter for cases where users need to integrate with specific shells or tools, balancing consistency with flexibility.

Hermetic / Reproducible Environments

Reproducible environments are essential for consistent task execution, especially in CI/CD pipelines or when targeting tools like Dockerfiles or nix flakes.

Support hermetic task execution by allowing tasks to specify their runtime environment, such as specific versions of tools or dependencies, using a CUE-based schema.
Integrate with container runtimes (e.g., Docker or Podman) to execute tasks in isolated, reproducible environments, with configuration options to define the container image and settings.
Provide a mechanism to pin dependency versions (e.g., for external tools or libraries) in task definitions, ensuring reproducibility across executions and systems.
Support generating environment definitions compatible with tools like nix or bazel, enabling cue cmd to serve as a frontend for these systems.

myitcv · 2025-08-22T13:00:21Z

myitcv
Aug 22, 2025
Maintainer

@infogulch just picking up on my comment in #1325 (comment).

Firstly a huge thank you for providing this usage report. Incredibly valuable.

We are in the process of picking up the discussion in #1325. I am trawling through all linked issues, discussions etc for relevant inputs, and this is a comprehensive analysis. Thank you.

Some questions based on what you presented.

Flow expression propagation limitations

Task outputs are critical for dynamic workflows, but non-task expressions derived from task outputs are not handled correctly and fail evaluation too early in value flow propagation.

Improve support for propagating non-task expressions derived from task outputs, ensuring they can be used in subsequent tasks or conditions without requiring awkward workarounds.

Please can you expand on what you mean here?

Task Deduplication

Avoiding redundant task execution improves performance, especially in large workflows. In addition deduplication completely eliminates any need for the insane overcomplicated gymnastics I performed to simulate shared global variables in make_tool.cue.

Implement task fingerprinting to detect when inputs and dependencies haven’t changed, skipping execution and reusing outputs of a previously executed task.

Cache and reuse evaluations across executions, with options to invalidate the cache explicitly or based on time/dependency changes.

Please can you expand on what capabilities you would want/expect from the workflow runner? The analogy I'm most obviously drawing here is with buildkit and its approach to caching. However that approach makes me nervous. The workflow runner/orchestrator (i.e. buildkit) can never precisely know if a step can be skipped or not. Only the task itself can, in certain situations, know that answer definitely. As a relatively straightforward example, take a Go build. When it comes to knowing whether a Go build/test/etc step needs to be re-run or not, the only safe answer can be provided by cmd/go itself. If it detects a cache hit, then it does nothing. Otherwise, it does some work. If we look at a workflow that calls cmd/go, to faithfully and precisely recreated "do we need to run this step" we would need to reimplement all the logic from within cmd/go. Far better to my mind is to simply invoke the command with relevant caches in place. Incidentally, this is also the approach that https://namespace.so/ have taken with respect to speeding up CI workflows. The Go build/test example is in many respects an incredibly simple example: what about a step that might have side effects, local or remote. How would the workflow runner/orchestrator know about these? It feels like the wrong rabbit hole to be going down to my mind. If instead all programs/tasks move to a model of using content-based, read-only append caches (like Go) which cleanly and straightforwardly allow the program itself to work out "do I need to do anything" then any runner/orchestrator can benefit.

Cross platform shell

Shell command behavior varies across systems, even within the same OS, due to differences in shell selection, version, and configuration. A consistent shell experience is crucial for portability.

Adopt a built-in shell interpreter library, such as mvdan/sh, to standardize shell command behavior across platforms, ensuring consistent syntax and execution regardless of the host environment.

Provide configuration options to override the default shell interpreter for cases where users need to integrate with specific shells or tools, balancing consistency with flexibility.

Very pleased to see another happy user of @mvdan's great package! Please can you expand on where you see the shell aspect being relevant? I ask because the existing exec.Run doesn't use a shell in any way. Although granted that GitHub actions and friends do, that's potentially more a function of the fact that they don't have the same richness of CUE when it comes to specifying/controlling environment etc other than via mutating GITHUB_ENV and friends.

Hermetic / Reproducible Environments

Reproducible environments are essential for consistent task execution, especially in CI/CD pipelines or when targeting tools like Dockerfiles or nix flakes.

No questions about this point, just to say that in the analysis I'm currently pulling together I am going to give this almost an entire section of its own because I think it's a crucial and fascinating aspect.

8 replies

myitcv Aug 28, 2025
Maintainer

@infogulch

Apologies for the delay in replying to this point. I drafted a fairly extensive reply only to lose it because I carelessly drafted it in GitHub discussions (where a draft apparently doesn't survive a browser restart, unlike my general experience with issues).

Regarding task deduplication, 20 & 21, consider make_tool.cue.

<snip>

Turns out it is very annoying. In addition, while repeating the same simple tasks many times may not consume a lot of wall time, they will definitely make the system as a whole less debuggable, polluting the mermaid output etc.

Thanks for expanding on this point. And an even bigger thank you for sharing the experience report of make_tool.cue - this really is incredibly useful.

My first thought on reading your comment (and the linked code) was that the problem presented by the situation you describe is more a question of how to compose tasks and introduce dependencies on parts of the composed tasks/workflows.

What's intriguing is that the pattern you've landed on depends on an interesting (but potentially brittle) feature of cue cmd. Generally speaking we declare commands with a set of tasks, potentially with some dependencies between those tasks, and according to the dependency graph we execute the tasks as concurrently as possible. Tasks are declared at regular (non-hidden, non-definition) paths within a named command, per the command specification. When asked to run a command, for example cue cmd x, the runner recursively walks the regular fields in the structure at command.x to discover tasks. But cue cmd also "discovers" tasks via references, even if a referenced task is not "contained" in the command.x structure, and even if the reference is via, say, a hidden field.

Looking at your example, the output of cue cmd in debug mode gives that hint in the case of your build_caddy command:

$ CUE_DEBUG=toolsflow !!
CUE_DEBUG=toolsflow cue cmd build_caddy
tools/flow task dependency graph:
```mermaid
graph TD
  t0("command.build_caddy.xbuild [Waiting]")
  t0-->t1
  t0-->t
  t1("meta._commands.reporoot [Ready]")
  t2("meta._commands.env [Ready]")
```
...

The meta._commands.reporoot task is not part of the command.build_caddy namespace, and not only that the path defining the task contains a hidden field.

The only requirement on a task declaration is that it must be declared at a path; it cannot simply be an expression. (My analysis notes that this probably has some significance when it comes to logging, tracing debugging etc, i.e. it is an important and "good" constraint, but I also acknowledge that more generally for function calls such a constraint is likely an unnecessary burden). command.build_caddy.xbuild, discovered by walking command.build_caddy, and meta._commands.env, discovered by a reference from an otherwise discovered task, both satisfy this requirement.

So what's the significance of this point about a path? Well the path gives us a name, a way of referring to a task. It actually gives the task an identity, which in some respects is the "fingerprint" I think you're after. It provides a point of synchronisation and coordination via references.

Either way, I think you're actually solving this problem in what I (personally) would consider the "right" way given the way the cue cmd is built today. What you're doing is injecting the dependency on vars/metadata into tasks, albeit indirectly via vars which are declared in terms of the outputs of those commands. As the workflow author you need to be specific on the dependency chain of commands. Only you can know when the metadata commands can be run (in terms of the state of the system), and if indeed the metadata commands need to be re-run after task X (which might have invalidated parts of the metadata, for example). Your approach achieves that.

The analysis of cue cmd I'm preparing very much picks up on this point of ensuring we not only make it easy to compose commands and tasks, and document well how this is done.

Incidentally, one thing looking more closely at your example has done has highlighted is that we could/should write an FAQ answer to the question "how do I declare a task that only gets run on demand?" where "on demand" means "is referenced by an otherwise discovered task". This pattern of a hidden field (which could be contained by the "calling" workflow) is actually rather elegant in that respect.

To expand slightly on the point above of how this feature of referenced tasks being discovered can be brittle. Consider the following test:

exec cue cmd x
cmp stdout stdout.golden

-- cue.mod/module.cue --
module: "cue.example"
language: {
	version: "v0.15.0"
}
-- p/p.cue --
package p

import (
	"tool/exec"
)

metadata: exec.Run & {
	cmd: ["ls"]
}
-- x_tool.cue --
package x

import (
	"tool/cli"
)

helperFlow: {
	echo: cli.Print & {
		text: "hello"
	}
}
command: x: helperFlow & {
	print: helperFlow.echo
}
-- stdout.golden --
hello

At first glance we might expect this to pass because we're ultimately referencing the same task helperFlow.echo the test fails:

> exec cue cmd x
[stdout]
hello
hello
> cmp stdout stdout.golden
diff stdout stdout.golden
--- stdout
+++ stdout.golden
@@ -1,2 +1,1 @@
 hello
-hello

FAIL: repro.txtar:2: stdout and stdout.golden differ

i.e. the task runs twice. It feels a bit too easy to make this mistake. Whilst careful documentation of best practice would no doubt help, perhaps there is a more robust solution. Again, this is picked up in the analysis.

Back to deduplication, I think a better way to solve my problem of multiple invocations of the metadata tasks would be for cue cmd to identify that these commands are duplicated and only execute the duplicate tasks once and substitute the result of the first run into the other invocations

Related to my previous point regarding BuildKit, I'm not sure cue cmd (v2) can ever know, in the general case, that it is safe to do this, in terms of the side effects a command might have and how the sequencing/timing is or isn't relevant to other commands. Per my comments above, I think instead we need to look at ways of expressing the intent of reuse via dependency injection or similar. Honestly what you have right now is a very good example, for this discussion at least!

Just to throw an idea out there, maybe add a new fingerprint: string field to every task, which is filled in with something unique by default, but if the user overwrites it then tasks with the same fingerprint are combined into one and only executed once. There would be a bunch of details to hash out here but I think something like this could work.

Per my comment above, I think you are already relying on a kind of fingerprinting: the path where a task is declared.

Your point that maybe we're barking up the wrong tree here is well received. I can accept that running git revision blah blah a dozen times may seem wasteful but is not actually a problem in practice, and I may have gone overboard trying to solve a non-issue. From my perspective the whole project was an experiment, and I wanted to know how annoying it would be if I ever did actually need to avoid running these kinds of tasks multiple times.

More great points! One critical point that working through this example has given us is a concrete example against which we can ask questions, make suggestions, observations etc. We can measure up a proposed v2 design against it as well. That said, I think we need to double down in efforts to flesh out real examples to help steer this conversation. We're looking for those edge cases, to help guide our understanding of "what's wrong". In that vein, if you have any more examples we can talk through please share!

myitcv Aug 28, 2025
Maintainer

@infogulch

Yes, exec.Run is a simple wrapper around Go's exec.Command, ...

<snip>

... This goes a long way to solve the windows problem, would solve the shell consistency problem, and would be a good start to making fully hermetic / reproducible environments actually usable.

Again the fact that we have an example here in the form of make_tool.cue is gold.

A few points, please excuse the bullet point form:

My analysis picks up the question of server processes in loose terms, but your run_caddy example has caused me to add a point about background jobs, daemons, job control. Thank you.
I note that because you are having to handle job control in a primitive way you can't take advantage of exec.Run's stdout and stderr, and hence have to do some lifting yourself. FWIW I don't think you should need to drop down to shell for either or both.
@rogpeppe did some fascinating work on tagged interpolations. The basic idea being that CUE can make it much easier to "harden" (from a security perspective) code that is ultimately evaluated/executed, reducing the risk of injection mistakes/attacks. Here is a simple example taken from one of the tests, that shows how easy it is to generate safer bash with all of the quoting of interpolated values taken care of:

exec cue export --out yaml
cmp stdout stdout.golden

-- x.cue --
package example

import "sh"

filename: "hell 'o.cue \" $foo"
script: sh.Format """
	# single quotes
	ls -l 'foo \(filename)'
	# double quotes
	ls -l "foo $HOME \(filename)"
	"""
-- stdout.golden --
filename: hell 'o.cue " $foo
script: |-
  # single quotes
  ls -l 'foo hell '"'"'o.cue " $foo'
  # double quotes
  ls -l "foo $HOME hell 'o.cue \" \$foo"

I reference this experiment because I think that whilst in many situations we can avoid dropping down to shell, in others we really can't and so having a robust approach to working at that interface is important as well.

This pattern of || in bash is an interesting one to consider in the context of how to succinctly write that in cue cmd v2. And not an uncommon pattern I would posit either - i.e. many will reach for this. Again, my analysis now includes a reference to this.
Also noting the use of bash -x - to achieve logging I assume? I have a reference in the analysis that talks about this kind of debugging/tracing, different levels of verbosity etc.

Assume for one second that all of these points were addressed in cue cmd v2 - do you still think you would prefer to use the bash -c pattern? I ask because it's one of the main areas where make_tool.cue loses structure and relies on runtime failure modes.

infogulch Aug 29, 2025
Author

Losing long drafts sends me down a roller coaster of all five stages of grief, I appreciate your persistence!

Yes the biggest issue in make_tool.cue is how to compose tasks, and I'm delighed to hear that composition will feature in your analysis.

Your description of how cue cmd finds commands is interesting because it unifies with my mental model, but I might describe it differently: References have no semantic meaning. That is, by the time cue cmd starts walking the object tree, all task references have been resolved and replaced with deep copies of the referenced object. The only thing that's real is the final "task path", as you put it. If two tasks have different paths then they are both executed with no regard to how they were defined or if one was created as a reference from another, etc.

My perception is that cue cmd is evaluated in two stages: the first stage imports modules and expands references etc (as much as possible), then the second stage it starts walking the expanded (basically json) object. I know this isn't exactly correct but it's been a useful mental shortcut to imagine how it works. The fact that this is complicated and unintuitive is why exporting is literally number 1 in my list above.

You understand the first half of the "fingerprint field" idea: that the default fingerprint is equivalent to the task path, which is correct. The interesting half of the idea is that you can override the fingerpint so that tasks with different paths can have the same fingerprint. With this feature, when cue cmd encounters a task with a fingerprint that matches another task that has already been executed, then it will not be run again and instead the result of the first execution would be substituted in its place. This would effectively give the author fine-grained control over task deduplication.

Consider how I might have structured make_tool.cue #vars with this fingerprint feature:

#vars: {
    root: exec.Run & { ..., fingerprint: "root" }
    version: exec.Run & { ..., fingerprint: "version" }
}

command: a: {
    vars: #vars
    print: cli.Print & { text: "The version is: \(vars.version)" }
}

command: b: {
    vars: #vars
    print: cli.Print & { text: "The root path is: \(vars.root)" }
}

command: composed: {
    a: command.a
    b: command.b
}

This is a very natural, almost naively simple composition. Now consider how cue cmd composed would see this (as I understand it):

command: composed: {
    a: {
        vars: {
            root: exec.Run & { ..., fingerprint: "root" }
            version: exec.Run & { ..., fingerprint: "version" }
        }
        print: cli.Print & { text: "The version is: \(vars.version)" }
    }
    b: {
        vars: {
            root: exec.Run & { ..., fingerprint: "root" }
            version: exec.Run & { ..., fingerprint: "version" }
        }
        print: cli.Print & { text: "The root path is: \(vars.root)" }
    }
}

Here, since command.composed.a.root.fingerprint == command.composed.b.root.fingerprint then only one of the *.root tasks will actually be executed, and the other will be substituted.

A fingerprint field would give authors basically unlimited control (and ability to footgun, admittedly) to deduplicate tasks while piercing through any abstraction and without requiring explicit references (which are fake anyway). Notably, this lifts the burden off of cue to know when to deduplicate.

Very glad to hear you will be looking at job control, make_tool.cue bends over backwards to work around this limitation and job control is essential for the use-cases mentioned in the OP. This pattern is used 3 times in make_tool.cue to start the server, run some http tests, then stop it: cli, docker, and also caddy which you noticed and is a bit more explicit because kill caddy didn't work iirc. This pattern kinda works but it's extremely brittle because if a task fails between start and kill then it leaves the background job running and I have to kill it manually. My OP didn't mention the example, but kinda hinted at this issue in the Control Flow section with 16 and 18.

If I can ask santa for anything I'd add "ability to trigger a waiting task to start after a matching substring is found in the stdout of an exec task", because it would allow me to eliminate task.test.ready which repeatedly pings the http port of the server until it returns ready. I'd prefer to trigger the tests based on when the server's ready message appears in stdout, because the way it is now the tests wait 5 seconds even if the server unexpectedly exits immediately.

Tagged interpolations look great, I love that idea.

Yes bash -x would be solved for me by being able to log when certain tasks start.

I never really liked using bash -c to begin with, solving the issues you mentioned would easily be enough to get me to move away from it.

I'm very happy that you found this make_tool.cue experiment helpful to analyze cue cmd. Being an example of how cue cmd could work for this use case is in fact the main reason why it exists. I think you're the first person to look at it so closely, so thank you for your attention. :)

myitcv Aug 29, 2025
Maintainer

Consider how I might have structured make_tool.cue #vars with this fingerprint feature:

I think we're actually talking about exactly the same concept, just that I'm highlighting that I believe it already exists (in a slightly too brittle form).

Consider a slightly tweaked form of what you provided as an example:

package x

import (
	"tool/cli"
	"tool/exec"
)

_vars: {
	root: exec.Run & {cmd: ["echo", "root"], stdout: string}
	version: exec.Run & {cmd: ["echo", "version"], stdout: string}
}

command: a: {
	vars: _vars
	print: cli.Print & {text: "The version is: \(vars.version.stdout)"}
}

command: b: {
	vars: _vars
	print: cli.Print & {text: "The root path is: \(vars.root.stdout)"}
}

command: composed: {
	a: command.a
	b: command.b
}

Looking at the output with CUE_DEBUG=toolsflow:

tools/flow task dependency graph:
```mermaid
graph TD
  t0("command.composed.a.vars.root [Ready]")
  t1("command.composed.a.vars.version [Ready]")
  t2("command.composed.a.print [Waiting]")
  t2-->t1
  t3("command.composed.b.print [Waiting]")
  t3-->t0
```
tools/flow task dependency graph:
```mermaid
graph TD
  t0("command.composed.a.vars.root [Running]")
  t1("command.composed.a.vars.version [Terminated]")
  t2("command.composed.a.print [Ready]")
  t2-->t1
  t3("command.composed.b.print [Waiting]")
  t3-->t0
```
The version is: version

tools/flow task dependency graph:
```mermaid
graph TD
  t0("command.composed.a.vars.root [Terminated]")
  t1("command.composed.a.vars.version [Terminated]")
  t2("command.composed.a.print [Running]")
  t2-->t1
  t3("command.composed.b.print [Ready]")
  t3-->t0
```
tools/flow task dependency graph:
```mermaid
graph TD
  t0("command.composed.a.vars.root [Terminated]")
  t1("command.composed.a.vars.version [Terminated]")
  t2("command.composed.a.print [Terminated]")
  t2-->t1
  t3("command.composed.b.print [Running]")
  t3-->t0
```
The root path is: root

tools/flow task dependency graph:
```mermaid
graph TD
  t0("command.composed.a.vars.root [Terminated]")
  t1("command.composed.a.vars.version [Terminated]")
  t2("command.composed.a.print [Terminated]")
  t2-->t1
  t3("command.composed.b.print [Terminated]")
  t3-->t0
```

What this shows is that the root and version graph only get run once (albeit with slightly unintuitive task paths). Which is, I believe, what you wanted to achieve, despite them apparently being composed twice, correct?

What's totally non-obvious about the above is how/why it has happened. What's even less obvious is how/why it can easily be broken. And this is exactly the point I'm making with respect to the brittleness of this pattern, and the lack of good documentation offering examples of best practice.

I'm also not clear whether this approach is "better" or "worse" than a purer dependency injection approach:

package x

import (
	"tool/cli"
	"tool/exec"
)

_varsShape: {
	[string]: {
		stdout: string
		...
	}
	root:    _
	version: _
}

vars: _varsShape & {
	root: exec.Run & {cmd: ["echo", "root"], stdout: string}
	version: exec.Run & {cmd: ["echo", "version"], stdout: string}
}

command: a: {
	_varsArgs: _varsShape
	print: cli.Print & {text: "The version is: \(_varsArgs.version.stdout)"}
}

command: b: {
	_varsArgs: _varsShape
	print: cli.Print & {text: "The root path is: \(_varsArgs.root.stdout)"}
}

command: composed: {
	_vars: vars
	a: command.a & {_varsArgs: _vars}
	b: command.b & {_varsArgs: _vars}
}

Certainly not trying to answer those questions here, rather this exchange is the basis for more good feedback in the analysis.

If I can ask santa for anything I'd add "ability to trigger a waiting task to start after a matching substring is found in the stdout of an exec task", because it would allow me to eliminate task.test.ready which repeatedly pings the http port of the server until it returns ready. I'd prefer to trigger the tests based on when the server's ready message appears in stdout, because the way it is now the tests wait 5 seconds even if the server unexpectedly exits immediately.

You'll be pleased to hear I already have this captured in a slightly more generic sense.

infogulch Aug 29, 2025
Author

Wow, interesting that my experience led me so far down the dependency injection approach that I didn't realize this simple example actually works today. Maybe this explains my confusion as I was developing make_tool.cue, I must have tripped over the brittle barrier at some point and concluded that any experience where it only ran tasks once was a mirage. Now I'm curious about what the "brittle" line actually is.

With this new understanding I might reframe my position to either one of two extremes:

EITHER: "References have no semantic meaning" is how it should work, and we need some separate orthogonal mechanism for deduplicating tasks (e.g. the fingerprint field).
OR: References have a very clearly defined semantic meaning that is reliable and useful in practice for controlling when tasks run once or multiple times.

You'll be pleased to hear I already have this captured in a slightly more generic sense.

💯 🎉

DavidGamba · 2025-08-22T16:14:32Z

DavidGamba
Aug 22, 2025

A few comments from my experience building task executors:

I didn't know cue cmd had mermaid output support. In general I usually generate a dot diagram since I think it is more ubiquitous.
When building task executors that shell out to a command, I find it useful to unit test things like exit codes, text outputs that I expect/rely on downstream, etc.
My personal projects use the Go context to inject mocks. I wonder if there is a way for CUE to enable Unit testing of some tasks.
Reference: https://github.com/DavidGamba/dgtools/tree/master/run#testing
The only easy way (manageable in a generic sense) I have found to build caching is to assume the task itself owns its caching so it is idempotent.
For tasks running in local environments the easiest is to compare timestamps for a given target output, like Make does. If the input timestamp is older than the output target do nothing.
Most my cacheable tasks print out a dummy target file with the timestamp to allow for this (not using cue cmd but in general).
CI environments are always fresh so they always execute in full, caching by timestamps in CI systems is hacky so there you have to implement more complex methods.

I wonder what is the expected scope of cue cmd. So far I have only used it to generate multi document YAML files from my kubernetes or argo workflows cue code. I have never personally considered it as a general purpose build system.

My biggest issue with most general purpose build systems is that their CLI argument parsers suck and they have no autocompletion for options or a way to nicely complete commands and subcommands.
I have been experimenting on the space here: https://github.com/DavidGamba/dgtools/tree/master/bake

2 replies

infogulch Aug 22, 2025
Author

Hey thanks for sharing your experience as well.

mermaid

Mermaid is about the same as dot for this purpose. Maybe mermaid is a bit easier to render depending on the tools you have available.

I wonder if there is a way for CUE to enable Unit testing of some tasks.

Task testing is a good idea, I like that. I found that breaking commands down into small commands it's easy to validate each step individually, basically manually testing.

assume the task itself owns its caching so it is idempotent

Yeah this is probably a good strategy in general. I found that if I to split everything down into small reusable subcommands and compose them into larger commands then there's a lot of duplicated tasks in the big composite commands that I would like to eliminate. I think your approach should be the default, but there should also be a way to override this to deduplicate tasks as needed.

I wonder what is the expected scope of cue cmd.

I think you have a good understanding of what cue cmd is currently good at. I think it could be used as a more general-purpose build system with some changes like the ones I mentioned.

My biggest issue with most general purpose build systems is that their CLI argument parsers suck

Yeah, or nonexistent like cue. I tried to get this changed in cue but contributors weren't interested. Autocompleting cue would be good enough for me, maybe that can happen with the cue lsp.

myitcv Aug 29, 2025
Maintainer

@DavidGamba - thanks for the comments!

I didn't know cue cmd had mermaid output support. In general I usually generate a dot diagram since I think it is more ubiquitous.

This is only via a debug flag today (CUE_DEBUG=toolsflow) and is extremely primitive. My notes on cue cmd v2 pick up the point about visualisation in much more detail, and hint at the different options (without wishing to overstep into the design phase).

When building task executors that shell out to a command, I find it useful to unit test things like exit codes, text outputs that I expect/rely on downstream, etc.
My personal projects use the Go context to inject mocks. I wonder if there is a way for CUE to enable Unit testing of some tasks.
Reference: https://github.com/DavidGamba/dgtools/tree/master/run#testing

This is well framed. I've added a point about unit testing of tasks to the analysis.

The only easy way (manageable in a generic sense) I have found to build caching is to assume the task itself owns its caching so it is idempotent.

This concurs with my understanding.

For tasks running in local environments the easiest is to compare timestamps for a given target output, like Make does. If the input timestamp is older than the output target do nothing.
Most my cacheable tasks print out a dummy target file with the timestamp to allow for this (not using cue cmd but in general).
CI environments are always fresh so they always execute in full, caching by timestamps in CI systems is hacky so there you have to implement more complex methods.

General-purpose caching in build systems is often unsafe because it relies on an approximation of a process's true inputs, from brittle file timestamps in make to more robust but still potentially incomplete content hashing in tools like Buildkit. This inherent risk becomes a certainty when remote state is involved, as the build system cannot track external network dependencies, making the cache non-deterministic and fundamentally unreliable.

For that reason, I don't think cue cmd v2 can ever have such behaviour as a default.

But that doesn't preclude the implementation of tasks that achieve the same thing being used as primitives to wrap another task. For example, imagine in a cue cmd v2 world a "function" pattern style primitive that allows the caller to conditionally run a task but only if certain conditions hold with timestamps, a la make.

I wonder what is the expected scope of cue cmd. So far I have only used it to generate multi document YAML files from my kubernetes or argo workflows cue code. I have never personally considered it as a general purpose build system.

Interestingly (as commented elsewhere) I think your primary use case is better served by @export.

My biggest issue with most general purpose build systems is that their CLI argument parsers suck and they have no autocompletion for options or a way to nicely complete commands and subcommands. I have been experimenting on the space here: https://github.com/DavidGamba/dgtools/tree/master/bake

This is indeed an interesting space.

Some CLIs are never going to "improve" - they exist more as historical artefacts than anything, and trying to change them would cause more problems than it will solve. Take for example ls. The best I think we can do in this situation is look to build abstractions (in CUE!) that make our lives using ls generally less brittle and more composable (consider the situation where the arguments to ls are being composed from two places).

For green-field CLIs I'm actually of the view that the flags and arguments consumed by a CLI must be a well-defined translation of a schema. i.e. the CLI is truly schema first. The schema acts as the source of truth for everything. In the rawest sense, such a CLI would be invocable with zero args or flags, just a config file read over stdin. Using the schema we can validate the correctness of the config. As a human I might choose to invoke the CLI using flags as a convenience, but these flags and arguments are immediately translated to a config file equivalent, and the previously described validate step proceeds. From a cue cmd (v2) perspective, such a CLI would only ever be invoked with zero args or flags, i.e. the pure config file approach (which could be rendered for the human as an equivalent flag and arg-based call if required).

Notes on cue cmd #3917

Uh oh!

Uh oh!

infogulch May 10, 2025

Debuggability

Logging

Metadata

Documentation

Control Flow

Flow expression propagation limitations

Task Deduplication

Cross platform shell

Hermetic / Reproducible Environments

Replies: 2 comments · 10 replies

Uh oh!

myitcv Aug 22, 2025 Maintainer

Flow expression propagation limitations

Task Deduplication

Cross platform shell

Hermetic / Reproducible Environments

Uh oh!

myitcv Aug 28, 2025 Maintainer

Uh oh!

myitcv Aug 28, 2025 Maintainer

Uh oh!

Uh oh!

infogulch Aug 29, 2025 Author

Uh oh!

myitcv Aug 29, 2025 Maintainer

Uh oh!

Uh oh!

infogulch Aug 29, 2025 Author

Uh oh!

DavidGamba Aug 22, 2025

Uh oh!

infogulch Aug 22, 2025 Author

Uh oh!

myitcv Aug 29, 2025 Maintainer

Notes on `cue cmd` #3917

infogulch
May 10, 2025

Replies: 2 comments 10 replies

myitcv
Aug 22, 2025
Maintainer

myitcv Aug 28, 2025
Maintainer

myitcv Aug 28, 2025
Maintainer

infogulch Aug 29, 2025
Author

myitcv Aug 29, 2025
Maintainer

infogulch Aug 29, 2025
Author

DavidGamba
Aug 22, 2025

infogulch Aug 22, 2025
Author

myitcv Aug 29, 2025
Maintainer