You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2.**Extraction error boundaries** - Caught by `@extractor_error_handler` decorator; malformed data is logged but doesn't block the pipeline
196
-
3.**Network resilience** - httpx timeout management (30 second timeout) prevents hanging; timeouts are logged as provider failures
197
-
4.**API error handling** - Google Sheets API errors are logged with full context; transient failures can be retried by re-running the pipeline
195
+
2.**Extraction error boundaries** - Caught by `@extractor_error_handler` decorator; malformed data is silently skipped without blocking the pipeline
196
+
3.**Network resilience** - httpx timeout management (30 second timeout) prevents hanging; HTTP errors silently return None for graceful degradation
197
+
4.**API error handling** - Google Sheets API errors are logged with full context at the application level; transient failures can be retried by re-running the pipeline
198
+
5.**Graceful degradation** - Failed articles are silently skipped (exception caught), allowing the pipeline to process successfully extracted articles
- "Batch write complete: X articles added to the sheet." - Load completion metric
230
+
- "✅ No new articles found" - No-op scenario indicator
221
231
222
-
These logs enable downstream monitoring, alerting, and audit trails—essential for operational pipelines.
232
+
These logs enable downstream monitoring, alerting, and audit trails—essential for operational pipelines. Utility modules (`get_page.py`, `extractors.py`) delegate logging to `main.py` for a unified view.
223
233
224
234
## Performance & Architecture
225
235
@@ -228,6 +238,7 @@ These logs enable downstream monitoring, alerting, and audit trails—essential
228
238
-**Sequential processing** - Providers processed one at a time; can be parallelized if needed
229
239
-**Generator-based streaming** - Articles flow through pipeline immediately after extraction (no batch buffering)
230
240
-**Memory efficient** - Generators enable incremental processing without storing all articles in memory
241
+
-**Centralized logging** - Single logging source in `main.py` provides unified observability across all pipeline stages
0 commit comments