Skip to content

[Bundler] RFC Proposal: Machine-readable output for update command #43

@zofrex

Description

@zofrex

High-level proposal

Add a new flag to the update command to produce machine-readable output, for better integration of Bundler with tooling. EG:

$ bundle update --machine-readable
[
  {
    "name": "nokogiri",
    "old": "1.13.6",
    "new": "1.13.8"
  },
  {
    "name": "rails",
    "old": "6.1.6",
    "new": "7.0.3.1"
  },
  {
    "name": "tzinfo",
    "old": "2.0.4",
    "new": "2.0.5"
  }
]

Motivation

It would be really useful to have machine-readable output from the update command.

This would make it easier for tooling (CI/CD, security inspections, automated updaters, posting summary messages) to consume & understand the changes made by dependency updates. Currently the only options are parsing the human-readable output from bundle update (which is not guaranteed to stay the same) or diffing the lockfile (which also doesn't guarantee its API, I don't believe?).

For example, I have a Github action that runs daily and checks for Gem updates on my projects. If there are updates, it creates a PR, and writes a summary in the PR text of the change it's making. I could make that summary easier to read if the output from bundle update was more structured. I could also potentially add a feature to auto-merge any update that only has minor version updates and passes tests. I could add an action to post to Slack when major updates are pending, so I know to check changelogs and see if I need to make any changes before updating.

This is a sister issue to ruby/rubygems#5913, but whereas that would make it easier for humans who are using bundle update directly to understand what has changed, a machine-readable output would make it easier for similar more useful summaries to be produced by automated systems that invoke bundle update.

A machine-readable option would also make it easier for users who are using bundle update directly, but have different requirements or opinions to the solution in ruby/rubygems#5913, to produce output summaries in a format or slicing of their choosing.

Key Challenges

Figuring out the changeset would already be required for ruby/rubygems#5913, and I believe that part is relatively easy: we have sufficient metadata hanging around after doing an update to understand which gems changed and what the old and new versions are.

Producing machine-readable output from that data is also relatively easy. JSON seems like a logical choice. Apparently we cannot use the json library? But producing simple json by hand is not hard, so that is not a blocker. YAML is another option, but more tooling uses JSON generally speaking (e.g. jq) and it has fewer footguns both for writing and parsing. Another perfectly decent option would be to produce column output, CSV, or some other simpler format, but I think a more structured format such as json or yaml will make it easier to extend in the future without breaking consumers.

The hardest part would be only producing the machine-readable output, and that's why I am opening an issue to discuss the RFC rather than going straight to creating an RFC.

A lot of different parts of code can and do print to stdout in Bundler, and it would not be trivial to add a flag to bundle update that makes all the output machine readable.

Potential approaches to producing only machine-readable output

There is potentially an easy, albeit quite horrible, way to get machine-readable output that won't have other human-readable messages dropped into the middle of it. We could define some special message e.g. "MACHINE-READABLE-STARTS-HERE" that will be output at the very end, after all human-readable output, followed by the machine-readable summary. This will add an extra hurdle to anyone wanting to consume it: rather than being able to write e.g. bundle update --machine-readable | jq . they would have to do something like bundle update --machine-readable | sed -e '1,/MACHINE-READABLE-STARTS-HERE/ d' | jq ..

The next easiest, and far nicer method for consumers, would be to carefully pass the option to suppress human-readable messages down through every code path that an update calls that might print out a message. This goes a little deep in places but would be somewhat feasible, although not massively tidy — not all of the code passes down options or context, so that will need to be added in places. This, however, might be fragile: new output messages might be added to other functions in the critical path of update which would then break the machine-readable output.

Conceptually the nicest way to handle this would be to upgrade the message output framework itself in Bundler to have two modes. As all output is already mediated through a single output framework within Bundler, this would then mean that a single switch for human/machine could reliably capture all messages, even those added in the future. This would require substantial changes to that framework, though. But with this method we probably have the best chance to not just capture the summary of version changes, but any other messages relevant to the update. It would also be the most robust and least likely to break.

Key questions

In my mind the two key questions here are:

  1. Would we want to make all output machine readable (including warnings about gems that failed to update, warnings about gems that went down in a version, post-install messages, messages about not updating particular groups, messages about platform mismatches, etc, etc), or would we settle for just the info on version changes?

If we do want to capture e.g. warnings and all, that pushes me more towards the larger change of updating how messages are output, so these can all be captured — otherwise it will require a lot of changes on an ad-hoc basis that I think will be more work in the long run, and fragile.

If there is less value in anything except version updates, maybe the amount of changes needed for producing only those without reworking how messages are printed isn't too high.

  1. Would there be any interest in making other Bundler commands have machine-readable output?

I haven't yet spent any time thinking about whether there might be value for tooling and other applications in other Bundler commands also having a stable, machine-readable interface. If there is, that would be another reason to change the output handling, so this could be done more easily and reliably on other commands as well.

If we think that only the update command would benefit from machine-readable output, then there's less value in reworking how messages are printed in general throughout Bundler.

Answers to these questions will help inform the big decision of how to handle the outputting problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions