Skip to content
This repository was archived by the owner on Jan 3, 2018. It is now read-only.
This repository was archived by the owner on Jan 3, 2018. It is now read-only.

What should we teach about provenance? #429

@gvwilson

Description

@gvwilson

It used to be easy: when we taught version control with Subversion, we told people that if they put:

$Revision:$

in a file, and set the file's properties correctly, Subversion would automatically update that string every time the file was changed so that it read:

$Revision: 123$

(or whatever the revision number was). This worked in pretty much any text file, so they could get the version control system to keep track of files' provenance for them. In particular, you could do this in a program (I'll show it in Python, but it works in any language):

my_version = "$Revision: 123"

def main(args):
    version = my_version.strip("$").split()[1]
    print "Results produced by my_program version", version
    do_calculations_and_print_output()

But now we're using Git, and that doesn't work, because Git identifies files using hashes of their contents, and if you modify a string in a file, its hash changes, and if that happens during a commit, it can rupture the spacetime continuum. @jiffyclub wrote a blog post a while back about a workaround, but it's Python-specific, and a bit clumsy compared to the old SVN way of doing things. What can/should we teach people about using the version control system to do these kinds of things?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions