Commit 25a12a4
## Problem
The normalizer was removing conference acronyms like (CVPR), (NeurIPS), and
(ICCV) from parentheses, making it impossible to match major conferences
against databases like OpenAlex. This caused legitimate conferences to show
as UNKNOWN instead of LEGITIMATE.
## Solution
### 1. Extract acronyms as aliases (normalizer.py)
- Added _extract_acronyms() method to identify conference/journal acronyms
- Uses heuristics: primarily uppercase (≥50%), 2-20 chars, starts uppercase
- Filters out metadata keywords (ISSN, online, invited, etc.)
- Extracts acronyms like CVPR, NeurIPS, ICCV before text cleaning
- Adds extracted acronyms to aliases list for backend matching
### 2. Enable alias fallback in OpenAlex backend (openalex_analyzer.py)
- If normalized name not found, iterates through aliases
- Stops at first successful match
- Logs which alias was used for debugging
- Includes tried aliases in NOT_FOUND response data
## Results
- "IEEE Conference on CVPR (CVPR)" → LEGITIMATE (was UNKNOWN)
- "International Conference on Computer Vision (ICCV)" → LEGITIMATE (was UNKNOWN)
- All 244 unit tests pass
- Mypy type checking passes
Co-authored-by: florath-ai-assistant[bot] <Andreas.Florath@telekom.de>
1 parent 33bc432 commit 25a12a4
File tree
2 files changed
+94
-0
lines changed- src/aletheia_probe
- backends
2 files changed
+94
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
62 | 77 | | |
63 | 78 | | |
64 | 79 | | |
| |||
72 | 87 | | |
73 | 88 | | |
74 | 89 | | |
| 90 | + | |
75 | 91 | | |
76 | 92 | | |
77 | 93 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
119 | 119 | | |
120 | 120 | | |
121 | 121 | | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
122 | 125 | | |
123 | 126 | | |
124 | 127 | | |
| |||
127 | 130 | | |
128 | 131 | | |
129 | 132 | | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
130 | 145 | | |
131 | 146 | | |
132 | 147 | | |
| |||
152 | 167 | | |
153 | 168 | | |
154 | 169 | | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
155 | 233 | | |
156 | 234 | | |
157 | 235 | | |
| |||
0 commit comments