Skip to content

[crawler] Capture project details #26

@themightychris

Description

@themightychris

The prototype crawler from #19 only captures a few initial basic project details, Code for Kenya/HURUmap provides a good example of a record with all fields filled in well and some natural changes to them in the history already

What are some more details a v2 should capture? Any thoughts on how we should organize it? (TOML has great support for grouping things any number of levels deep)

I don't think we want to capture any details in the index that routinely change day-by-day in the life of a project (e.g. number of open issues, number of contributors), BUT maybe we do capture things like that as binary or tiered buckets (e.g. has-issue=true or contributors=5-10)

I think we should pull in the GitHub description and/or opening paragraph of the README directly, and then for other big wordy things record their presence, link to them, and maybe measure their health or summarize them if there's a valuable way to do so. (e.g. we can record which license is used and link to the license, we can record which of GitHub's standard community health files are present and link to them)

We should also record the presence of any civic.json or publiccode.yaml file, and pull in some or all of their contents into a normalized form.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions