Open
Description
This issue tracks what needs to be done to be able to ingest new volumes using the new Python library.
- Provide convenience functions to create new collections, volumes, and papers. 🟢
- Basic functionality for this added; interaction with different indices is an open question.
- Add function to generate new bibkeys. 🟢
- Add function to add new files, with checksum calculation. 🟢
- Move normalization logic into the library. 🟡
- Move LaTeX conversion into the library. 🟡
- Do we need our custom latex_to_unicode? Can we use pylatexenc instead? Should this be added as
MarkupText.from_latex
?
- Do we need our custom latex_to_unicode? Can we use pylatexenc instead? Should this be added as
- Make XML serialization produce minimal diffs by respecting order of elements in existing XML file. 🟠
Sketch how an adapted ingestion script should function, roughly:
acl-anthology/bin/ingest_mitpress.py
Lines 306 to 398 in 431c8e9