You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Merge pull request #4 from katjabercic/feat/matrac-project
Matrač project
-wikidata SPARQL query adjustments (list of known exclusions)
-added fetching from related articles and keyword extraction (with rate limiting)
-added local llm execution in order to categorize items
-TECH: added Makefile with regular commands
-TECH: added root requirements.txt that also installs dev dependencies
-TECH: Fixed Github test action not to run twice when pushing on open PR (this PR for example)
-Added feedback mechanism for categorization
-Update README.MDs
Copy file name to clipboardExpand all lines: README.md
+86-32Lines changed: 86 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,35 +8,87 @@ For a demonstration of a page with at least one link, see for example `{baseurl}
8
8
9
9
To install all the necessary Python packages, run:
10
10
11
-
pip install -r requirements.txt
11
+
```bash
12
+
make prepare-web # Which does the necessary steps for env, db, superuser
13
+
# OR
14
+
pip install -r web/requirements.txt
15
+
```
16
+
17
+
Prepare an environment:
18
+
```bash
19
+
cp web/.env.example web/.env
20
+
```
12
21
13
22
Next, to create a database, run:
14
23
15
-
python manage.py migrate
24
+
```bash
25
+
python manage.py migrate
26
+
```
16
27
17
28
In order to use the administrative interface, you need to create an admin user:
18
29
19
-
python manage.py createsuperuser
30
+
```bash
31
+
python manage.py createsuperuser
32
+
```
20
33
21
34
Finally, to populate the database, run
22
35
23
-
python manage.py import_wikidata
36
+
```bash
37
+
python manage.py import_wikidata
38
+
# OR
39
+
make populate-db
40
+
```
41
+
42
+
* In order to fetch wikipedia articles and extract keywords from them:
43
+
```bash
44
+
make install-scispacy
45
+
```
46
+
then configure your email `WIKIPEDIA_CONTACT_EMAIL`in [source_wikidata.py](web/slurper/source_wikidata.py)
47
+
* This is needed
48
+
* Then run the database population (make sure your db is cleared)
49
+
50
+
24
51
25
52
If you ever want to repopulate the database, you can clear it using
26
53
27
-
python manage.py clear_wikidata
54
+
```bash
55
+
python manage.py clear_wikidata
56
+
```
57
+
58
+
### To run the categorizer
59
+
The categorizer is setup to work with several models, divided into free and paid.
60
+
All of them are run locally, so expect some performance hits. The models are downloaded when the categorizer is
61
+
ran initially, and by default the free models are used.
62
+
63
+
The database needs to be filled in before running it, so:
64
+
```bash
65
+
make populate-db
66
+
```
67
+
then
68
+
```bash
69
+
make categorize
70
+
```
71
+
72
+
There are some known existing issues that have some inline fixes, such as `gpt2` getting stuck
73
+
and returning the same prompt, then few times`---\n\n\n---`.
74
+
75
+
For more details see [categorizer readme](web/categorizer/README.md).
28
76
29
77
## Notes for developers
30
78
31
79
In order to contribute, install [Black](https://github.com/psf/black) and [isort](https://pycqa.github.io/isort/) autoformatters and [Flake8](https://flake8.pycqa.org/) linter.
32
-
33
-
pip install black isort flake8
80
+
```bash
81
+
make install-dev
82
+
```
34
83
35
84
You can run all three with
36
-
37
-
isort .
38
-
black .
39
-
flake8
85
+
```bash
86
+
make fix-files
87
+
# Or manually
88
+
isort .
89
+
black .
90
+
flake8
91
+
```
40
92
41
93
or set up a Git pre-commit hook by creating `.git/hooks/pre-commit` with the following contents:
42
94
@@ -47,35 +99,37 @@ black . && isort . && flake8
47
99
```
48
100
49
101
Each time after you change a model, make sure to create the appropriate migrations:
50
-
51
-
python manage.py makemigrations
102
+
```bash
103
+
python manage.py makemigrations
104
+
```
52
105
53
106
To update the database with the new model, run:
54
-
107
+
```bash
55
108
python manage.py migrate
109
+
```
56
110
57
111
## Instructions for Katja to update the live version
'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'function assigning numbers to some subsets of a set, which could be seen as a generalization of length, area, volume and integral'},
"itemDescription": {"xml:lang": "en", "type": "literal", "value": "function assigning numbers to some subsets of a set, which could be seen as a generalization of length, area, volume and integral"},
0 commit comments