-
Notifications
You must be signed in to change notification settings - Fork 10
Description
It has happened to me several times now that I ingest a large set of repositories, I look at the data, and I notice oddities caused by a repository that should not have been there in the first place.
Is there a workflow to remove a repository from the database, and rerun the plotting?
Currently I don't know of such a workflow, so I manually remove the repository, delete the database, and restart ingestion from scratch. This is ok, but it can be annoying when ingestion is slow (several minutes on large repository sets).
I thought about running sqlite on the database and doing a DELETE operation on all raw_commits coming from this directory. However, if I understand correctly, the plotting data comes from the authors table that I would need to update with new aggregates, and I don't know how to do it easily.
Assuming this does not currently exist, my proposal would be to have a command fornalder reanalyze foo.db that would drop the current authors table and recompute it from the raw_commits table as it currently exists.
(Another option of course would be to have a fornalder repo-remove foo.db repo.git command that removes a repository from a table, instead of adding it as fornalder ingest foo.db repo.git does. But that sounds like more work.)