Open
Description
Is your feature request related to a problem? Please describe
Image builds fail, but, we don't know how many of those failures are related to the system, versus user input.
Describe the behaviour you'd like
- when the system fails, count as a system failure, either as a label on an existing metric, and/or one or more new metrics. The intent is to grant us the ability to tell how many system failures fail image builds overall.
- use the build output (which is on
stderr
) to identify user errors-
create a set of parsing rules for known user errors. E.g. a
RUN
command in the Dockerfile exiting with an error. This would contain a log with>>> RUN
:
-
we can incrementally add to this list of parsing rules whenever we identify new user errors. For this, it would be useful for the rule list to be specified in config, so it becomes easy to update.
-
- an image build could fail if we cannot authenticate with a private registry to pull the private base image, this should be considered a user failure (introduced in Support private registries #8550)
- there are likely other paths that require instrumentation, ☝️ are just initial thoughts
Additional context
Related: #15572