Description
Upcoming CSL-JSON changes include support for EDTF as a date input format. I recently implemented EDTF, and I have some thoughts about how we can make use of its features in CSL.
What EDTF has that we don't
EDTF is a great format for CSL, because we have supported date ranges since forever, and some of the unofficial date formats we use resemble EDTF already. However it adds three new things we did not have before.
- Unspecified parts of dates, using the
X
character to blot them out. - A flag for "approximate" in addition to "uncertain"
- A datetime representation, e.g.
2019-07-16T01:57:29Z
. - A defined calendar.
Unspecified date parts / 1999-XX
and friends
You might think that we could just add terms for month-unspecified
and day-unspecified
and call it a day. But I think we'd be missing out -- the spec doesn't advertise it very well, but the feature is more expressive than that.
There are a few different variations on the XX
in EDTF level 1. In my opinion the spec should have named them like so: 19XX
=> century, 199X
=> decade, 1999-XX
=> month of year, 1999-XX-XX
=> day of year, 1999-07-XX
=> day of month. Styles/locales could render 19XX
as "20th century" or "1900s" if they so wished! However, given this is academic citation, I'm not sure how useful that would be. If anyone can point to a style that might want special rendering for any of these forms, then it's something we can definitely do.
Approximate
We currently have is-uncertain-date
, the circa
term, and "circa": true
in CSL-JSON. For reference, EDTF encodes these its uncertainties as ?
=> uncertain, ~
=> approximate, %
=> both.
On a basic level, you could add terms for approximate
and approximate-uncertain
, and also add is-approximate-date="issued"
as a conditional test.
One complication is that EDTF makes approx/uncertain a property of each end of a date range, i.e. you can have 1999?/2003
meaning (uncertain 1999) to 2003. Our current model is insufficient for that, it can only work with a date as a whole. You could therefore add a certainty
date part as well, which simply renders one of the three terms or nothing, in either the single date or on each end of the range. This would be an improvement over the existing syntax even ignoring the approximate addition.
Date time representation
My favourite citation style, AGLC4, now supports citing tweets/forum posts/videos, and requires a timestamp as well as a date. It renders them like so:
Social media posts, forum posts and online videos uploaded to sites such as YouTube may be cited as follows:
Username
,Title
(Social Media Platform
,Full Date
,Time
) <URL
>.... The time zone from which the post is accessed (eg ‘AEDT’) should be included if the social media platform adjusts the time based on the local time zone.
@s_m_stephenson (Scott Stephenson) (Twitter, 17 July 2017, 9:37pm AEST) <https://twitter.com/s_m_stephenson/status/8871694255514419 21>, ....
I don't think this will be the only one out there. We don't currently support times at all, and I think we should.
A couple of notes about this:
- You might want a bunch of new
<date-part>
s for each one, but alternatively you could have only one new<date-part name="time" format="..." />
and just tell styles/locales to supply a time format string and reference one of the popular encodings for that. - I'd say
<date-part name="timezone" />
as well. - EDTF's timezones are optional but can be set to
Z
(= UTC) or a +/- UTC offset in hours or hours:minutes. They are really just offsets, not zones. - That's not enough information to render "AEDT". If you add tz database entries like
Australia/Melbourne
that's probably enough info to query a list of known abbreviations for that tz at that time of year, DST-wise (but the abbreviations are not nearly as standardised as the tz names). - The Temporal API's datetime format allows people to store the zone in tz database entry format in addition to the offset, i.e.
+03:00[Africa/Nairobi]
. Not sure if we'd want that (complicates edtf parsing, is technically a completely new format if we bolt it on after a valid EDTF, so no thanks) but maybe some JSON way of specifying this would help.
A defined calendar
AFAIK CSL has never operated within a specific calendar, it just renders what you put in. EDTF uses the ISO 8601 calendar, see my notes here on what that means: https://docs.rs/edtf/0.2.0/edtf/#notes-on-edtf-and-the-iso-8601-calendar-system. (Obviously you would render these in gregorian style generally, ie 0000 renders as 1BC, -0099 as 100BC.) For modern dates, that's the same as we would normally write them, but in some places dates weren't written in the modern Gregorian calendar until the early 1900s (e.g. Russia, 1918). The UK only switched in 1752. That's really not that long ago, especially since some case law/legislation from before then is still cited fairly frequently.
Idea 1: Accuracy of old dates
I don't think you'll find any citation styles which dictate what calendar to write dates in, but that isn't to say that the problem doesn't exist; in fact it is probably part of the problem for historians, since nobody is forcing anyone else to write what kind of date something is. We could tip the scales with a very simple feature: a configuration in a style or a locale (?) which sets the start of the modern era for dates. Any date before this could be rendered with a term for new style dates (e.g. (n.s.)
), thus forcing people to check that it actually is a new style date.
A much more complex feature would be the configurable rendering of dates in other calendars. I'm pretty sure @fbennett had a feature for rendering the oddities of Japanese calendars, but I'm not sure we should require every CSL implementation to do complex calendar maths. It could be an optional thing. If we wanted such a feature, we could make the the Unicode CLDR calendars optional. (Although, CLDR does not include Julian! How did they manage to omit it???)
Idea 2: Days of the week
Again, I don't know if any styles demand this, but until now it has not been technically possible to know which day of the week something is, because CSL didn't define a calendar. If you make CSL calendar aware, you get days of the week for free.
In summary
EDTF opens up a couple of new opportunities that are worth considering. The most obviously valuable one appears to be datetimes, but there are a lot of possibilities.