Open
Description
Hi,
Every time one runs blogdown::build_site()
, https://github.com/codemeta/codemeta/raw/master/crosswalk.csv is downloaded 16 times.
After several builds in a short time, Github rate-limits these requests, which fails the build.
Do you know if there is a way to make blogdown cache the crosstable across page builds?
Metadata
Metadata
Assignees
Labels
No labels
Activity
cboettig commentedon Mar 23, 2020
@progval Yes, thanks for the ping. Simple question many answers:
For a simple fix, just rebuild the site with
serve_site()
instead ofbuild_site()
will tell blogdown not to re-render the.Rmd
files incontent
dir, but instead stick with the already-knitted html outputs from them.second, yeah, we could avoid having each of the crosswalk pages doing it's own download, or we could have R cache those downloads (e.g. by wrapping the download URL in
pins::pin()
or manually caching a copy), inhttps://github.com/codemeta/codemeta.github.io/blob/hugo/content/crosswalk/datacite.Rmd#L14
Zooming out, the whole design here probably needs an overhaul. As you probably know, haven't kept up with manually adding a new
.Rmd
for each new source column in crosswalk, so those crosswalks really aren't complete any more. Would love your opinion on this. Arguably it is quite useful to have a page with stuff like more background on maven or whatnot, but also this Rmd approach clearly doesn't scale super well. @mbjones and I were just discussing this in the context of a larger overhaul for codemeta.github.io website that would strip it down to something more minimal that is easier to maintain and keep current. The site today feels a bit bloated and stale to me, and not all that user friendly.Related to the last is the fact that codemeta is now really two somewhat separate projects - while we set out primarily to create a crosswalk, we now basically maintain a 'new' standard and set of supporting tools, and rather separately maintain a list of crosswalk tables from other standards (largely without a lot of supporting tools except for some special cases like R, where
codemetar
crosswalks a lot more terms than are listed in theR
crosswalk table anyway). Some ideas on how to proceed with these two pieces (e.g. should we omit or move the crosswalk stuff off of the main codemeta website?) would be helpful.Thanks so much for all your work and contributions, it's really fantastic!
progval commentedon Mar 23, 2020
Excellent!
You could also use Travis (or any other CI) to automatically build the branch with the HTML (currently master): https://docs.travis-ci.com/user/deployment/pages/ (it won't automatically rebuild on changes of crosswalk.csv, but you could set up a daily rebuild) from just the
.Rmd
files; and remove.md
files from the hugo branch (which might need to be renamed; maybe rename it tomaster
and rename the currentmaster
togh-pages
)This way, humans never have to commit generated code.
Regarding the crosswalk, we could add a single script that generates them all, from a single input file. That would also mostly solve the multiple downloads issue (there'd be only this script and
terms.Rmd
)Even though they don't support many package-manager/language metadata formats, AFAIK Bolognese would accept contributions in that direction.
I also wrote a tool running at Software Heritage that converts several formats to CodeMeta and stores it in our database.
Its reach is limited by most languages using a script in lieu of a metadata file, and we don't want to run arbitrary code on our infrastructure (though parsing with regexps seems to work in most cases, I just didn't spend much time on it).
With Travis auto-building the website, most of this would no longer be a problem. We could also make https://github.com/codemeta/codemeta a git submodule of https://github.com/codemeta/codemeta.github.io and have the build process pull the local file, which would spare downloads at build time
You're welcome :)
Thanks for your all work as well!
cboettig commentedon Mar 23, 2020
Yes, this totally should be done. It would be easiest to do so with the existing GitHub Actions script for blogdown: https://github.com/r-lib/actions/blob/master/examples/blogdown.yaml . This would avoid the extra faffing with credentials you need to do this in travis. A PR would be great for this, I'm juggling too many things to do this anytime soon!
Re crosswalk scripts -- yeah, definitely makes sense to automate that more, contributions welcome there too!! Though the crosswalk tables we have lack important metadata about "what" a given column actually is: a link to a homepage, an icon, a title and a description would be a big help.
Re translation, linking more of those tools would be a great addition.
Thanks again !
progval commentedon Mar 24, 2020
Unfortunately I'm going to be busy with another project too, but I'll keep this issue in mind