Commit 5a5fd9b
committed
fix: Add URL extraction to AsyncDBWriter to populate journal_urls table
The journal_urls table was remaining empty despite successful sync operations
because the AsyncDBWriter._batch_write_journals() method was not extracting
and storing URLs from journal data.
This fix adds comprehensive URL extraction logic that:
- Extracts URLs from top-level 'urls' field in journal records
- Extracts URLs from metadata['urls'] (Algerian Ministry format)
- Extracts URLs from metadata['website_url'] (Kscien format)
- Extracts URLs from metadata['source_url'] (general source URLs)
- Deduplicates URLs per journal using set() for efficiency
- Uses INSERT OR REPLACE SQL pattern consistent with existing code
- Follows database schema with proper journal_id foreign key relationships
Resolves issue where journal URL data was being collected by data sources
but lost during the batch write process, leaving journal_urls table empty.1 parent bdf0b93 commit 5a5fd9b
1 file changed
+43
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
137 | 137 | | |
138 | 138 | | |
139 | 139 | | |
| 140 | + | |
140 | 141 | | |
141 | 142 | | |
142 | 143 | | |
| |||
206 | 207 | | |
207 | 208 | | |
208 | 209 | | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
209 | 243 | | |
210 | 244 | | |
211 | 245 | | |
| |||
253 | 287 | | |
254 | 288 | | |
255 | 289 | | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
256 | 299 | | |
257 | 300 | | |
258 | 301 | | |
| |||
0 commit comments