Skip to content

forTEXT/GermAnProse

Repository files navigation

GermAnProse

The JSON files each contain the annotations for a single text. Annotator names were replaced to ensure anonymity is retained.

We plan to align the naming within the JSON structure more closely with the naming in our paper for the final release. Here, we provide a broad description of the structure; additional documentation will be released alongside the ChiA guidelines upon acceptance.

  • One JSON object per document
  • ChiA Annotations are:
    • mentions (for character mentions)
    • participations (for the agency data)
    • direct_speech
  • All ChiA annotations are top-level keys in the JSON and contain an object with annotators and their corresponding annotations.
    • Each annotation holds a "spans" object referring to the spans of text that are annotated.
  • The narrativity annotations are called "events", we do not have multiple annotators here
  • The verb class annotations are available under "verbclasses", we do not have multiple annotators here
  • "keyness" is our measure of plot keyness
  • "tokens" and "sentences" are automatically created data (using spaCy)
  • "scenes" contains the scene annotations
  • "speech_info" holds the timing information in audiobooks for each reader

For an example of the data structure, see below:

{
    "title": "Document Name",
    "full_text": full_text,
    "participations": {
        "annotator_a": {
            "mentions": {
                "character_a": [{"kind": "name", "spans": [[12, 15]]}]
            }
        },
        "annotator_b": ...
    },
    ...
}

About

A dataset of annotated German prose

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages