Skip to content

[image builder] count failures and differentiate between system and user #15573

Open
@kylos101

Description

@kylos101

Is your feature request related to a problem? Please describe

Image builds fail, but, we don't know how many of those failures are related to the system, versus user input.

Describe the behaviour you'd like

  • when the system fails, count as a system failure, either as a label on an existing metric, and/or one or more new metrics. The intent is to grant us the ability to tell how many system failures fail image builds overall.
  • use the build output (which is on stderr) to identify user errors
    • create a set of parsing rules for known user errors. E.g. a RUN command in the Dockerfile exiting with an error. This would contain a log with >>> RUN:
      image

    • we can incrementally add to this list of parsing rules whenever we identify new user errors. For this, it would be useful for the rule list to be specified in config, so it becomes easy to update.

  • an image build could fail if we cannot authenticate with a private registry to pull the private base image, this should be considered a user failure (introduced in Support private registries #8550)
  • there are likely other paths that require instrumentation, ☝️ are just initial thoughts

Additional context

Related: #15572

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions