Skip to content

a workflow to remove a repository after ingestion? #15

@gasche

Description

@gasche

It has happened to me several times now that I ingest a large set of repositories, I look at the data, and I notice oddities caused by a repository that should not have been there in the first place.

Is there a workflow to remove a repository from the database, and rerun the plotting?

Currently I don't know of such a workflow, so I manually remove the repository, delete the database, and restart ingestion from scratch. This is ok, but it can be annoying when ingestion is slow (several minutes on large repository sets).

I thought about running sqlite on the database and doing a DELETE operation on all raw_commits coming from this directory. However, if I understand correctly, the plotting data comes from the authors table that I would need to update with new aggregates, and I don't know how to do it easily.

Assuming this does not currently exist, my proposal would be to have a command fornalder reanalyze foo.db that would drop the current authors table and recompute it from the raw_commits table as it currently exists.

(Another option of course would be to have a fornalder repo-remove foo.db repo.git command that removes a repository from a table, instead of adding it as fornalder ingest foo.db repo.git does. But that sounds like more work.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions