Skip to content

Devise a robust, transactional system for "publishing" repository versions #1236

@horazont

Description

@horazont

Summary

Many actions the editor do involve sending or processing diffs (= changes before and after an editor action). For instance, when updating a XEP, the editor will generally send an update email which informs the community about the changes.

There is tooling which analyses the current state of a xeps repository checkout and dumps it in a machine-readable format (make build/xeplist.xml). There are also various tools which act on this state or on the difference between two such states (tools/archive.py, tools/send-updates.py).

The challenge in the context of automation is to know the "old" and the "new" state to base the comparison on.

Concept

Conceptually, this requires that the XEP tooling knows about changes which have been formally "published". "Published" changes have been announced to the mailing list and archived into the attic.

Note that "Published" is not equivalent to changes being in the main branch of the git repository.

Requirements

We don't yet know what such a system should look like, so instead of concrete steps to implement it, here are the requirements:

  • MUST allow running downstream tools on each transaction, passing the "published" and "unpublished" state in form of a xeplist.xml file into it.

    The goal here is to use this system to run tools/archive.py and tools/send-updates.py, and potential future tools.

  • MUST allow a dry-run functionality

    During a dry-run, changes MUST NOT be marked as published and downstream tools should be informed in some to-be-defined way that they are in a dry-run situation (preferably environment variable).

  • MUST NOT mark changes as published if any of the dependent tools fail to run

    Example: If the archive.py tool crashes on a diff between commits A and B, B MUST be processed again in the future and MUST NOT be marked as "seen" by the system.

  • MUST NOT batch multiple changes to the same document into the same transaction

    Example: Between the previously published state of the repository and the next run of the tool, XEP-1234 gets updated to 1.0.0 and then to 1.1.0. The tooling MUST handle the transitions to 1.0.0 and 1.1.0 separately.

  • SHOULD NOT send duplicate emails for the same revision

    This may happen depending on how the previous requirements are satisfied, but SHOULD be avoided if possible.

  • SHOULD ignore any changes not covered by xeplist.xml

Additional Notes

  • It should be evaluated whether, if Create tool to facilitate automatic creation of git tags for new XEP releases #1238 is reliable, we can use a list of "seen" or "published" git tags to keep the state. That would be pretty transparent and neat and easy to implement.
  • Another way to track "published" would be a somehow protected branch/head in some git repository. The commit at which that head points would be the last published one. This is pretty git-native, which is nice. (But it doesn't natively address the "MUST NOT batch multiple changes" requirement)
    In this model, a simple transactional system would:
    1. Check out the "published" branch
    2. For each commit in linear order on the main branch (be careful with merges!):
      1. Pull that commit onto the published branch (this must be a fast-forward operation)
      2. Run the tools
      3. On failure, roll back to the previous commit and break out of the loop
    3. Push the published branch state: because we roll back one step on failure, this is always safe
    4. Report any errors $somewhere

Metadata

Metadata

Assignees

No one assigned

    Labels

    Editor ToolingIssue relates to process/tooling

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions