Skip to content

Commit 905af42

Browse files
authored
Merge pull request #81 from ARGOeu/devel
Version 1.0.2
2 parents ff59fc6 + efd8c5d commit 905af42

39 files changed

+50651
-275
lines changed

.gitignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,11 @@
1+
# The following are handy to ignore in this specific project
2+
# please ignore generated folders with results such as /data and /report
3+
/data
4+
/report
5+
6+
# please ignore changes in the configuration file. If default configuration file structure is changed please override this rule with git add -f
7+
/config.yaml
8+
19
# Byte-compiled / optimized / DLL files
210
__pycache__/
311
*.py[cod]

README.md

Lines changed: 61 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@ A framework for counting the recommender metrics
33

44
# Preprocessor v.0.2
55
<p align="center">
6-
<a href="https://github.com/nikosT/eosc-recommender-metrics/blob/devel/docs/Preprocessor.png">
7-
<img src="https://github.com/nikosT/eosc-recommender-metrics/blob/devel/docs/Preprocessor.png" width="70%"/>
6+
<a href="https://github.com/nikosT/eosc-recommender-metrics/blob/master/docs/Preprocessor.png">
7+
<img src="https://github.com/nikosT/eosc-recommender-metrics/blob/master/docs/Preprocessor.png" width="70%"/>
88
</a>
99
</p>
1010

1111
# RS metrics v.0.2
1212
<p align="center">
13-
<a href="https://github.com/nikosT/eosc-recommender-metrics/blob/devel/docs/RSmetrics.png">
14-
<img src="https://github.com/nikosT/eosc-recommender-metrics/blob/devel/docs/RSmetrics.png" width="70%"/>
13+
<a href="https://github.com/nikosT/eosc-recommender-metrics/blob/master/docs/RSmetrics.png">
14+
<img src="https://github.com/nikosT/eosc-recommender-metrics/blob/master/docs/RSmetrics.png" width="70%"/>
1515
</a>
1616
</p>
1717

@@ -61,14 +61,38 @@ optional arguments:
6161
```
6262
6363
8. Configure `./preprocessor.py` by editting the `config.yaml` or providing another with `-c`:
64-
<p align="center">
65-
<a href="https://github.com/nikosT/eosc-recommender-metrics/blob/devel/docs/preprocessor-config.png">
66-
<img src="https://github.com/nikosT/eosc-recommender-metrics/blob/devel/docs/preprocessor-config.png" width="70%"/>
67-
</a>
68-
</p>
64+
```yaml
65+
66+
# Set the desired connector (e.g. MongoDB)
67+
Source:
68+
MongoDB:
69+
host: localhost
70+
port: 27017
71+
db: recommender_dev
72+
73+
User:
74+
export: true
75+
76+
Service:
77+
# if true it keeps only published, otherwise all
78+
# this has an effect in exporting when from is set to 'source'
79+
# and also in metrics calculations where service is considered
80+
published: true
6981
82+
# Use the EOSC-Marketplace webpage
83+
# to associate page_id and service_id
84+
download: true
85+
path: ./page_map
7086
71-
9. Run from terminal: `./rsmetrics.py` to run RSmetrics
87+
export: true
88+
from: 'page_map' # or 'source'
89+
90+
# Calculate source's metrics
91+
Metrics: true
92+
93+
```
94+
95+
9. Run from terminal: `./rsmetrics.py --users --services` to run RSmetrics and include the `users.csv` and `services.csv` files generated by the Preprocessor
7296
```bash
7397
_____ _____ _ _
7498
| __ \ / ____| | | (_)
@@ -151,4 +175,31 @@ chmod u+x ./get_service_catalog.py
151175
./get_service_catalog.py
152176
```
153177
178+
#### Serve Evaluation Reports as a Service
179+
180+
The `webservice` folder hosts a simple webservice implemented in Flask framework which can be used to host the report results.
181+
182+
__Note__: Please make sure you work in a virtual environment and you have already downloaded the required dependencies by issuing
183+
`pip install -r requirements.txt`
184+
185+
The webservice application serves two endpoints
186+
- `/` : This is the frontend webpage that displays the Report Results in a UI
187+
- `/api` : This api call returns the evaluation metrics in json format
188+
189+
To run the webservice issue:
190+
```
191+
cd ./webservice
192+
flask run
193+
```
194+
195+
The webservice by default runs in localhost:5000 you can override this by issuing for example:
196+
```
197+
flask run -h 127.0.0.1 -p 8080
198+
```
199+
200+
There is an env variable `RS_EVAL_METRIC_SOURCE` which directs the webservice to the generated `metrics.json` file produced after the evaluation process.
201+
This by default honors this repo's folder structure and directs to the root `/data/metrics.json` path
202+
203+
You can override this by editing the `.env` file inside the `/webservice` folder, or specificy the `RS_EVAL_METRIC_SOURCE` variable accordingly before executing the `flask run` command
204+
154205
_Tested with python 3.9_

config.yaml

Lines changed: 25 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -5,30 +5,34 @@ Source:
55
port: 27017
66
db: recommender_dev
77

8-
User:
9-
export: true
10-
#from: 'user_actions'
11-
#from: 'recommendations'
12-
from: 'source'
8+
# The database where the Preprocessor's
9+
# and RSmetrics data are stored
10+
Datastore:
11+
MongoDB:
12+
host: localhost
13+
port: 27017
14+
db: rsmetrics
1315

1416
Service:
1517
# Use the EOSC-Marketplace webpage
16-
# to associate page_id and service_id
17-
download: true
18-
path: ./page_map
19-
20-
export: true
21-
#from: 'user_actions'
22-
#from: 'recommendations'
23-
from: 'source'
24-
#from: 'page_map'
25-
26-
published: false # applies only on source option
27-
28-
User-actions:
29-
merge: false # not implemented yet
30-
31-
# Calculate source's metrics
18+
# to retrieve resources and
19+
# associate the page_id and the service_id
20+
Portal:
21+
download: true
22+
path: ./page_map
23+
24+
# if true it keeps only published, otherwise all
25+
# this has an effect in exporting when from is set to 'source'
26+
# and also in metrics calculations where service is considered
27+
published: true
28+
29+
# which origin to use to retrieve Resources
30+
# two options available:
31+
# - 'source': use the Connector
32+
# - 'page_map': use the EOSC Marketplace
33+
from: 'page_map' # or 'source'
34+
35+
# Calculate source's metrics (pre-metrics)
3236
Metrics: true
3337

3438

environment.yml

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,20 +25,31 @@ dependencies:
2525
- zlib=1.2.11=h7f8727e_4
2626
- pip:
2727
- beautifulsoup4==4.10.0
28+
- certifi==2021.10.8
2829
- charset-normalizer==2.0.12
30+
- click==8.1.3
31+
- Flask==2.1.2
2932
- idna==3.3
30-
- joblib==1.1.0
33+
- importlib-metadata==4.11.4
34+
- itsdangerous==2.1.2
35+
- Jinja2==3.1.2
36+
- joblib==1.2.0
37+
- MarkupSafe==2.1.1
3138
- natsort==8.1.0
3239
- numpy==1.22.3
3340
- pandas==1.4.2
3441
- pymongo==4.1.0
3542
- python-dateutil==2.8.2
43+
- python-dotenv==0.20.0
3644
- pytz==2022.1
37-
- pyyaml==6.0
45+
- PyYAML==6.0
3846
- requests==2.27.1
3947
- scikit-surprise==1.1.1
4048
- scipy==1.8.0
4149
- six==1.16.0
4250
- soupsieve==2.3.2
4351
- surprise==0.1
4452
- urllib3==1.26.9
53+
- Werkzeug==2.1.2
54+
- zipp==3.8.0
55+
- flask-pymongo==2.3.0

get_service_catalog.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ def get_service_catalog_items(content):
5050
for item in results:
5151
a = item.findChildren("a", recursive=False)[0]
5252
row = [int(item.attrs["data-service-id"]),
53-
item.text.strip(), a['href']]
53+
a.text.strip(), a['href']]
5454
rows.append(row)
5555
# sort rows by id
5656
rows = sorted(rows, key=lambda x: x[0])

metric_descriptions/README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Metric Descriptions folder
2+
3+
This folder is meant to contained detailed yaml files defining in structure the implementation details of each metric
4+
To add a new detailed description in this folder please consult the first file added here: diversity.yml and structure
5+
the information accordingly
6+
7+
### Important Note on filenames
8+
The filename should correspond to the name of the metric used in `metrics.json` output and the extension `.yml`
9+
So for the metric Shannon Diversity the short name used in `metrics.json` is `diversity` thus the filename is `diversity.yml`
10+
11+
### Multiline values
12+
In yaml fields that you need to support multiline string content please use the `>` operator
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
name: Catalog Coverage
2+
3+
summary: >
4+
The percentage (%) of the division of the unique services found in recommendations to the total number of published services
5+
6+
description: >
7+
The Catalog Coverage is described by the formula $$\frac{unique\_rec\_services}{services}$$
8+
9+
output:
10+
type: float
11+
min: 0
12+
max: 100
13+
comment: Catalog Coverage is 0 when none of the services is being recommended, and 100 when all of them are being recommended.
14+
15+
prerequisites:
16+
- all available recommendations
17+
- all available services
18+
19+
process:
20+
- step: Retrieve recommendations
21+
details: >
22+
Retrieve all available recommendations found in source
23+
- step: Gather all unique services
24+
details: >
25+
Gather all unique services found in all available recommendations
26+
- step: Retrieve services
27+
details: >
28+
Retrieve all available published services found in source
29+
- step: Calculate ratio
30+
details: >
31+
Calculate the percentage (%) of the division of the unique services found in recommendations to the total number of published services
32+
33+
# This is optional for visual stylization of the metric when displayed on the report
34+
style:
35+
icon: pe-7s-box2
36+
color: bg-malibu-beach
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
name: Click-Through Rate
2+
3+
summary: >
4+
The number of user clicks through recommendations panels divided by the total times recommendation panels were presented to users.
5+
6+
description: >
7+
The number of user clicks through recommendations panels divided by the total times recommendation panels were presented to users. Takes into account all historical data of user actions. The metric is expressed by the formula: $$Click-Through Rate=\frac{clicks}{views}$$
8+
output:
9+
type: float
10+
min: 0
11+
max: +inf
12+
comment: A value of 0 indicates that no clicks through recommendations panels occurred
13+
14+
prerequisites:
15+
- all available user actions
16+
17+
process:
18+
- step: Retrieve user actions with recommendation panel
19+
details: >
20+
Get only the user actions that present a recommendation panel to the user in the source page. Those are actions with the following source paths: (i) /services, (ii) /services/, (iii) /services/c/{any category name}
21+
- step: Count user actions with recommendation panel
22+
details: >
23+
Count the items in the above list as they represent the times recommendations panels were presented to the users of the portal
24+
- step: Filter list
25+
details: >
26+
Narrow the above list into a new subset by selecting only user actions that originate from a recommendation panel. Those are actions that have the 'recommendation' string in the Action column
27+
- step: Count user actions with clicks through recommendation panel
28+
details: >
29+
Count the items in the subset as they represent the times users clicked through recommendations
30+
- step: Calculate ratio
31+
details: >
32+
Divide the items of the subset with the items of the first list to get the click-through rate
33+
34+
# This is optional for visual stylization of the metric when displayed on the report
35+
style:
36+
icon: pe-7s-mouse
37+
color: bg-grow-early
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
name: Diversity Gini Index
2+
3+
summary: >
4+
Measures Recommendations' diversity. The index is 0 when all items are chosen equally often, and 1 when a single item is always chosen.
5+
6+
description: >
7+
The diversity (\(G\)) of the recommendations according to Gini Index. The index is 0 when all items are chosen equally often,
8+
and 1 when a single item is always chosen
9+
(see book \(\href{https://link.springer.com/10.1007/978-1-4939-7131-2_110158}{https://link.springer.com/10.1007/978-1-4939-7131-2_110158}\)). Generally, the Gini Index mathematical expression is defined as:
10+
$$G=\frac{1}{n-1}\sum_{j=1}^{n}(2j-n-1)p(i_j)$$where \(i_1,\ldots,i_n\) is the list of items ordered according to increasing \(p(i)\) and each item \(i\) accounts for a proportion \(p(i)\) of user recommendations. In RS Metrics the computation is determined by the following forumla:
11+
$$Diversity=\frac{1}{n-1}\sum_{j=1}^{n}(2j-n-1)\left(\frac{count(j)}{recommendations}\right)$$
12+
13+
output:
14+
type: float
15+
min: 0
16+
max: 1
17+
comment: The index is 0 when all items are chosen equally often, and 1 when a single item is always chosen.
18+
19+
prerequisites:
20+
- recommendations without anonymous users
21+
- all available services
22+
23+
process:
24+
- step: Clean up
25+
details: >
26+
Recommendations clean up; entries removal where users or services are not found in "users" or "services" files accordingly
27+
- step: Services Impact
28+
details: >
29+
Calculation of the impact of the services, by counting how many times each service i was suggested to all possible users: count(j)
30+
- step: Sort Services Impact from low to high
31+
details: >
32+
Sort the number of how many times each service (i.e. i) was suggested from the lower to the higher value, in order to apply the respective weight (j). The computation includes services with 0 recommendation occurrence
33+
- step: Recommended Probability of the Services
34+
details: >
35+
For each service calculate its recommended probability by dividing the number of service's occurrence found in the recommendations to the total number of recommendations
36+
- step: Service-based product computation
37+
details: >
38+
Calculation of the product of the recommended probability from previous step and services' respective index j, for each service individually
39+
- step: Gini Index computation
40+
details: >
41+
Computation of the overall value by summing all values from previous step
42+
43+
# This is optional for visual stylization of the metric when displayed on the report
44+
style:
45+
icon: pe-7s-shuffle
46+
color: bg-plum-plate

metric_descriptions/diversity.yml

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
name: Diversity Shannon Entropy
2+
3+
summary: >
4+
Measures Recommendations' diversity. The entropy is 0 when a single item is always chosen or recommended,
5+
and log n when n items are chosen or recommended equally often.
6+
7+
description: >
8+
The diversity (\(H\)) of the recommendations according to Shannon Entropy. The entropy is 0 when a single item
9+
is always chosen or recommended, and log(n) when n items are chosen or recommended equally often
10+
(see book \(\href{https://link.springer.com/10.1007/978-1-4939-7131-2_110158}{https://link.springer.com/10.1007/978-1-4939-7131-2_110158}\)). Generally, the Shannon Entropy mathematical expression is defined as:
11+
$$H=-\sum_{i=1}^{n}p(i)\log_2 p(i) $$In RS Metrics the computation is determined by the following forumla:
12+
$$Diversity=-\sum_{i=1}^{services}\left(\frac{count(i)}{recommendations}\right)\log_2 \left(\frac{count(i)}{recommendations}\right)$$
13+
14+
output:
15+
type: float
16+
min: 0
17+
max: +\(\infty\)
18+
comment: The entropy is 0 when a single item is always chosen or recommended, and log n when n items are chosen or recommended equally often.
19+
20+
prerequisites:
21+
- recommendations without anonymous users
22+
- all available services
23+
24+
process:
25+
- step: Clean up
26+
details: >
27+
Recommendations clean up; entries removal where users or services are not found in "users" or "services" files accordingly
28+
- step: Services Impact
29+
details: >
30+
Calculation of the impact of the services, by counting how many times each service i was suggested to all possible users: count(i)
31+
- step: Recommended Probability of the Services
32+
details: >
33+
For each service calculate its recommended probability by dividing the number of service's occurrences found in the recommendations to the total number of recommendations
34+
- step: Service-based product computation
35+
details: >
36+
Calculation of the product of the recommended probability from previous step and the logarithmic value of it, for each service individually
37+
- step: Shannon Entropy computation
38+
details: >
39+
Computation of the overall value by summing all values from previous step
40+
41+
# This is optional for visual stylization of the metric when displayed on the report
42+
style:
43+
icon: pe-7s-way
44+
color: bg-sunny-morning

0 commit comments

Comments
 (0)