IntChron parser

IntChron <<https://intchron.org/>> is listed in #2 as not being included because it requires web scraping. However, after spending some time playing with it, I wonder if this might be revisited.

Essentially, IntChron seems to do the same thing as c14bazAAR—systematically compile dates from existing databases—with a web-based API. An IntChron parser would be more complicated than the existing parsers because, as far as I can tell, there is no way to extract the entire database as a single file. But it should still be possible to get it without resorting to web scraping. The key is that every HTML page on IntChron can also be accessed in csv, json, or txt format. This includes the "index" pages that eventually lead you to individual date records. It think it could be worth the extra complexity for IntChron because it does seem to include a lot of dates (for example the entire ORAU database) and it's backed by the Oxford C14 Lab so it's likely to grow over time.

I can think of a few ways you could approach this, depending on how much flexibility you want to give the user. At the simplest, one could implement a multi-stage parser in c14bazAAR:

1. Retrieve the list of "hosts" (<https://intchron.org/host.csv>)
2. Retrieve the list of records-by-country for each host (e.g. <https://intchron.org/oxa/record.csv>)
3. Retrieve the list of sites for each country (e.g. <https://intchron.org/record/oxa/Jordan.csv>)
4. Retrieve the list of dates for each site (e.g. <https://intchron.org/record/oxa/Jordan/Dhuweila.csv>)
5. Parse and collate the dates (actually quite easy because the IntChron format is similar to c14bazAAR's)

On the other end of the spectrum, one could write an R interface to IntChron as its own package, which c14bazAAR could then use as a dependency to retrieve either the entire database or a user-specific subset. That could be worthwhile if the IntChron standard does become widely used, but as things stand I'm not sure that it's worth the extra effort.

I'd be happy to put some work into this, but I thought I would first raise the issue and ask whether you think it is something that fits into c14bazAAR, and what the best approach to doing it might be.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IntChron parser #115

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

IntChron parser #115

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions