How to best model the data? #5
dpriskorn
announced in
Announcements
Replies: 4 comments
-
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Question:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment



Uh oh!
There was an error while loading. Please reload this page.
-
I created this datamodel today for the Riksdagen open data to sentences project and I would like some feedback from the community.
Basically the idea is to analyze all 160k documents and store every single unique rawtoken and sentence in a database.
This is going to be a huge database which I'm not sure ToolsDB can handle (WMF recommend Trove for databases >125 GB)
I want to store normalized tokens and later I want to link the raw tokens to Wikidata Lexeme Form IDs.
I'm curious to see:
The different tables are explained in the UML here:
https://github.com/dpriskorn/riksdagen_sentences/blob/save_to_database/diagrams/datamodel.puml
Beta Was this translation helpful? Give feedback.
All reactions