Skip to content

v1.6.0

Latest

Choose a tag to compare

@BobBorges BobBorges released this 13 Mar 09:47
dc6a269

Data formats

records.zip includes the complete records in the ParlaClarin XML format.

records_speeches_DECADE.ndjson.gz includes the speeches in the records in newline delimited JSON format, aggregated by decade. These are compressed via gzip.

Quality estimates

The quality estimates are available in the quality.zip archive.

References

A list of references can be found in the reference-list.bib file.

What's Changed

New features and data

  • Add modern pagenumbers (applying the add_modern_pagenumbers.py to the curation) by @mandlilaast in #163
  • Add modern pagenumbers (applying the add_modern_pagenumbers.py to the curation) by @mandlilaast in #163
  • Curate protocols step 2: fix add_uuid.py related problems. by @mandlilaast in #165
  • Protocol curation step 3: add links to pdf pages by @mandlilaast in #167
  • Curation of 2023-2025 protocols step 4: find the dates by @mandlilaast in #168
  • Feat: classify_note_seq based on the found script from older branch by @mandlilaast in #170
  • 20232425 protocols (step 8): Feat: add uuid after classifying paragraphs into notes and utterances. by @mandlilaast in #171
  • Protocol 202324 and 202425 curation (step 9): map introductions to the speaker in the metadata. by @mandlilaast in #172
  • 2023-25 Protocol curation step 10: split protocols into
    sections by @mandlilaast in #173
  • Feat: classify titles in protocols by @mandlilaast in #174
  • Last step of curating the 20232425 protocols by @mandlilaast in #175
  • Merge utterances by @mandlilaast in #186
  • Feat: annotate-speeches.py by @mandlilaast in #194
  • Add 2023/24 and 2024/25 protocols to the corpus by @mandlilaast in #176
  • feat: merge consecutive utterances and add next/prev tags by @ninpnin in #198
  • feat: Add newline delimiter JSON as a release file format by @ninpnin in #201

Bug fixes

Misc. chores

Full Changelog: v1.5.0...v1.6.0