Releases: swerik-project/riksdagen-records
v1.6.1alpha
Data formats
records.zip includes the complete records in the ParlaClarin XML format.
records_speeches_DECADE.ndjson.gz includes the speeches in the records in newline delimited JSON format, aggregated by decade. These are compressed via gzip.
Quality estimates
The quality estimates are available in the quality.zip archive.
References
A list of references can be found in the reference-list.bib file.
v1.6.0
Data formats
records.zip includes the complete records in the ParlaClarin XML format.
records_speeches_DECADE.ndjson.gz includes the speeches in the records in newline delimited JSON format, aggregated by decade. These are compressed via gzip.
Quality estimates
The quality estimates are available in the quality.zip archive.
References
A list of references can be found in the reference-list.bib file.
What's Changed
New features and data
- Curate step 1: classify_intros and resegment by @mandlilaast in #156
- Curation step 2: add_uuid.py by @mandlilaast in #161
- Add modern pagenumbers (applying the add_modern_pagenumbers.py to the curation) by @mandlilaast in #163
- Add modern pagenumbers (applying the add_modern_pagenumbers.py to the curation) by @mandlilaast in #163
- Curate protocols step 2: fix add_uuid.py related problems. by @mandlilaast in #165
- Protocol curation step 3: add links to pdf pages by @mandlilaast in #167
- Curation of 2023-2025 protocols step 4: find the dates by @mandlilaast in #168
- Feat: classify_note_seq based on the found script from older branch by @mandlilaast in #170
- 20232425 protocols (step 8): Feat: add uuid after classifying paragraphs into notes and utterances. by @mandlilaast in #171
- Protocol 202324 and 202425 curation (step 9): map introductions to the speaker in the metadata. by @mandlilaast in #172
- 2023-25 Protocol curation step 10: split protocols into sections by @mandlilaast in #173
- Feat: classify titles in protocols by @mandlilaast in #174
- Last step of curating the 20232425 protocols by @mandlilaast in #175
- Merge utterances by @mandlilaast in #186
- Feat: annotate-speeches.py by @mandlilaast in #194
- Add 2023/24 and 2024/25 protocols to the corpus by @mandlilaast in #176
- feat: merge consecutive utterances and add next/prev tags by @ninpnin in #198
- feat: Add newline delimiter JSON as a release file format by @ninpnin in #201
Bug fixes
- fix doc-ids' zero padding in redirect URLs by @BobBorges in #153
- fix MP test by @BobBorges in #157
- fix schema test failure by @BobBorges in #158
- fix next /prev coherance by @BobBorges in #183
- failing tests at main by @BobBorges in #155
- Fix speaker mapping by @mandlilaast in #177
- Fix: MP test rewamp, add baseline errors and make sure that test runs correctly. by @mandlilaast in #200
Misc. chores
- Release zip check by @ninpnin in #190
- prerelease: minor version by @ninpnin in #202
- prerelease: minor version by @BobBorges in #204
Full Changelog: v1.5.0...v1.6.0
v1.5.1alpha3
Data formats
records.zip includes the complete records in the ParlaClarin XML format.
records_speeches_DECADE.ndjson.gz includes the speeches in the records in newline delimited JSON format, aggregated by decade. These are compressed via gzip.
Quality estimates
The quality estimates are available in the quality.zip archive.
References
A list of references can be found in the reference-list.bib file.
v1.5.1alpha2
Preprocessed easy-to-use formats
Download persons_csv.zip, persons.xlsx or persons.sqlite to easily access preprocessed data.
Normal form DB for more complex processing
The persons.zip archive contains the original unmerged tables as CSVs for more complex processing.
Quality estimates
The quality estimates are available in the quality.zip archive.
References
A list of references can be found in the reference-list.bib file.
v1.5.1-alpha
Full Changelog: v1.5.0...v1.5.1-alpha
v1.5.0
What's Changed
- join split segments by @BobBorges in #112
- DOC ID test by @BobBorges in #111
- PR: Estimate the quality of segmentation (seg/note) before 1910 #108 by @mandlilaast in #117
- add test for doc date integrity by @BobBorges in #118
- Add class teardown method and refactor QE pipeline (#108) by @mandlilaast in #119
- Fix OCR Quality Estimation Script: implement teardown and plotting by @mandlilaast in #124
- Fix: requirements and package handling. Add rapidfuzz. by @mandlilaast in #138
- Pull Request: Quality Control of Dates in Records by @mandlilaast in #125
- modularize workflows by @BobBorges in #146
- add link to full pdf and info about origin by @BobBorges in #142
- annotate speeches and create speech IDs by @BobBorges in #148
- Fix: Improve mapping between speeches and MPs. by @mandlilaast in #127
- Use lighter-weight cff validator by @BobBorges in #150
- prerelease: minor version by @BobBorges in #149
New Contributors
- @mandlilaast made their first contribution in #117
Full Changelog: v1.4.2...v1.5.0
v1.4.2
What's Changed
- feat: heuristically split merged margin notes and bodytext into two paragraphs by @ninpnin in #98
- feat: heuristically split merged margin notes and bodytext into two paragraphs by @ninpnin in #100
- prerelease: patch version by @BobBorges in #101
Full Changelog: v1.4.1...v1.4.2
v1.4.1
v1.4.0
What's Changed
- Lowercase dates by @ninpnin in #70
- QE meeting dates by @MansMeg in #74
- align test/ with other repos and quality/ by @BobBorges in #80
- add reference tracking infrastructure by @BobBorges in #82
- prerelease: minor version by @BobBorges in #83
Full Changelog: v1.3.0...v1.4.0
v1.3.0
What's Changed
- revert intros changed to titles by @BobBorges in #44
- Fixing Xeller to X- eller, first batch by @ljo in #55
- pdoc docs by @BobBorges in #58
- add speaker mapping dimmension by @BobBorges in #59
- add dimmension by @BobBorges in #60
- prerelease workflow by @BobBorges in #61
- Quality estimation documentation by @MansMeg in #54
- undelete file(s) (content) by @BobBorges in #62
- replace kblabb urls in 2kammartid by @BobBorges in #66
- label ip debates in the 202223 parliament year by @BobBorges in #48
- date accuracy -- again by @BobBorges in #69
- prerelease: minor version by @BobBorges in #68
New Contributors
Full Changelog: v1.2.0...v1.3.0