Data formats
records.zip includes the complete records in the ParlaClarin XML format.
records_speeches_DECADE.ndjson.gz includes the speeches in the records in newline delimited JSON format, aggregated by decade. These are compressed via gzip.
Quality estimates
The quality estimates are available in the quality.zip archive.
References
A list of references can be found in the reference-list.bib file.
What's Changed
New features and data
- Curate step 1: classify_intros and resegment by @mandlilaast in #156
- Curation step 2: add_uuid.py by @mandlilaast in #161
- Add modern pagenumbers (applying the add_modern_pagenumbers.py to the curation) by @mandlilaast in #163
- Add modern pagenumbers (applying the add_modern_pagenumbers.py to the curation) by @mandlilaast in #163
- Curate protocols step 2: fix add_uuid.py related problems. by @mandlilaast in #165
- Protocol curation step 3: add links to pdf pages by @mandlilaast in #167
- Curation of 2023-2025 protocols step 4: find the dates by @mandlilaast in #168
- Feat: classify_note_seq based on the found script from older branch by @mandlilaast in #170
- 20232425 protocols (step 8): Feat: add uuid after classifying paragraphs into notes and utterances. by @mandlilaast in #171
- Protocol 202324 and 202425 curation (step 9): map introductions to the speaker in the metadata. by @mandlilaast in #172
- 2023-25 Protocol curation step 10: split protocols into sections by @mandlilaast in #173
- Feat: classify titles in protocols by @mandlilaast in #174
- Last step of curating the 20232425 protocols by @mandlilaast in #175
- Merge utterances by @mandlilaast in #186
- Feat: annotate-speeches.py by @mandlilaast in #194
- Add 2023/24 and 2024/25 protocols to the corpus by @mandlilaast in #176
- feat: merge consecutive utterances and add next/prev tags by @ninpnin in #198
- feat: Add newline delimiter JSON as a release file format by @ninpnin in #201
Bug fixes
- fix doc-ids' zero padding in redirect URLs by @BobBorges in #153
- fix MP test by @BobBorges in #157
- fix schema test failure by @BobBorges in #158
- fix next /prev coherance by @BobBorges in #183
- failing tests at main by @BobBorges in #155
- Fix speaker mapping by @mandlilaast in #177
- Fix: MP test rewamp, add baseline errors and make sure that test runs correctly. by @mandlilaast in #200
Misc. chores
- Release zip check by @ninpnin in #190
- prerelease: minor version by @ninpnin in #202
- prerelease: minor version by @BobBorges in #204
Full Changelog: v1.5.0...v1.6.0