|
4 | 4 | """ |
5 | 5 | Evaluate production S2AND models (SPECTER1 vs SPECTER2) on various datasets. |
6 | 6 |
|
| 7 | +
|
| 8 | +In this script we try to answer the question: if we deploy SPECTER2, will S2AND care? |
| 9 | +Both with retraining and without retraining. |
| 10 | +
|
| 11 | +This is done with s2and-mini. Ai2 employee, find it at s3://ai2-s2-research/s2and/s2and-mini/ |
| 12 | +
|
| 13 | +With retraining (random seed 42): |
| 14 | +
|
| 15 | +Performance with SPECTERv1 data, on arnetminer (B3): (0.922, 0.985, 0.952) |
| 16 | +Performance with SPECTERv2 data, on arnetminer (B3): (0.93, 0.988, 0.958) |
| 17 | +
|
| 18 | +Performance with SPECTERv1 data, on inspire (B3): (0.958, 0.974, 0.966) |
| 19 | +Performance with SPECTERv2 data, on inspire (B3): (0.995, 0.959, 0.977) |
| 20 | +
|
| 21 | +Performance with SPECTERv1 data, on kisti (B3): (0.951, 0.971, 0.961) |
| 22 | +Performance with SPECTERv2 data, on kisti (B3): (0.946, 0.98, 0.963) |
| 23 | +
|
| 24 | +Performance with SPECTERv1 data, on pubmed (B3): (0.849, 0.988, 0.913) |
| 25 | +Performance with SPECTERv2 data, on pubmed (B3): (0.86, 0.988, 0.92) |
| 26 | +
|
| 27 | +Performance with SPECTERv1 data, on qian (B3): (0.936, 0.943, 0.94) |
| 28 | +Performance with SPECTERv2 data, on qian (B3): (0.95, 0.964, 0.957) |
| 29 | +
|
| 30 | +Performance with SPECTERv1 data, on zbmath (B3): (0.966, 0.984, 0.975) |
| 31 | +Performance with SPECTERv2 data, on zbmath (B3): (0.975, 0.991, 0.983) |
| 32 | +
|
| 33 | +--- |
| 34 | +
|
| 35 | +Without retraining, |
| 36 | +
|
| 37 | +Performance with SPECTERv1 data, on arnetminer (B3): (0.977, 0.982, 0.979) |
| 38 | +Performance with SPECTERv2 data, on arnetminer (B3): |
| 39 | +
|
| 40 | +Performance with SPECTERv1 data, on inspire (B3): (0.993, 0.964, 0.978) |
| 41 | +Performance with SPECTERv2 data, on inspire (B3): |
| 42 | +
|
| 43 | +Performance with SPECTERv1 data, on kisti (B3): (0.96, 0.957, 0.959) |
| 44 | +Performance with SPECTERv2 data, on kisti (B3): |
| 45 | +
|
| 46 | +Performance with SPECTERv1 data, on pubmed (B3): (1.0, 0.968, 0.984) |
| 47 | +Performance with SPECTERv2 data, on pubmed (B3): |
| 48 | +
|
| 49 | +Performance with SPECTERv1 data, on qian (B3): (0.985, 0.955, 0.969) |
| 50 | +Performance with SPECTERv2 data, on qian (B3): |
| 51 | +
|
| 52 | +Performance with SPECTERv1 data, on zbmath (B3): (0.967, 0.955, 0.961) |
| 53 | +Performance with SPECTERv2 data, on zbmath (B3): |
| 54 | +
|
| 55 | +
|
7 | 56 | Usage: |
8 | 57 | # Evaluate on inventors_s2and (default) |
9 | 58 | python scripts/eval_prod_models.py |
|
0 commit comments