Skip to content

Commit a3ac27c

Browse files
authored
feat: add configurable timeouts (#673)
## ℹ️ Description - Related issues: #671, #658 - Introduces configurable timeout controls plus retry/backoff handling for flaky DOM operations. We often see timeouts which are note reproducible in certain configurations. I suspect timeout issues based on a combination of internet speed, browser, os, age of the computer and the weather. This PR introduces a comprehensive config model to tweak timeouts. ## 📋 Changes Summary - add TimeoutConfig to the main config/schema and expose timeouts in README/docs - wire WebScrapingMixin, extractor, update checker, and browser diagnostics to honor the configurable timeouts and retries - update translations/tests to cover the new behaviour and ensure lint/mypy/pyright pipelines remain green ### ⚙️ Type of Change - [ ] 🐞 Bug fix (non-breaking change which fixes an issue) - [x] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (`pdm run test`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Centralized, configurable timeout system for web interactions, detection flows, publishing, and pagination. * Optional retry with exponential backoff for operations that time out. * **Improvements** * Replaced fixed wait times with dynamic timeouts throughout workflows. * More informative timeout-related messages and diagnostics. * **Tests** * New and expanded test coverage for timeout behavior, pagination, diagnostics, and retry logic. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
1 parent ac678ed commit a3ac27c

16 files changed

+972
-121
lines changed

CONTRIBUTING.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -187,6 +187,14 @@ All Python files must start with SPDX license headers:
187187
- Use appropriate log levels (DEBUG, INFO, WARNING, ERROR)
188188
- Log important state changes and decision points
189189

190+
#### Timeout configuration
191+
- The default timeout (`timeouts.default`) already wraps all standard DOM helpers (`web_find`, `web_click`, etc.) via `WebScrapingMixin._timeout/_effective_timeout`. Use it unless a workflow clearly needs a different SLA.
192+
- Reserve `timeouts.quick_dom` for transient overlays (shipping dialogs, payment prompts, toast banners) that should render almost instantly; call `self._timeout("quick_dom")` in those spots to keep the UI responsive.
193+
- For single selectors that occasionally need more headroom, pass an inline override instead of creating a new config key, e.g. `custom = self._timeout(override = 12.5); await self.web_find(..., timeout = custom)`.
194+
- Use `_timeout()` when you just need the raw configured value (with optional override); use `_effective_timeout()` when you rely on the global multiplier and retry backoff for a given attempt (e.g. inside `_run_with_timeout_retries`).
195+
- Add a new timeout key only when a recurring workflow has its own timing profile (pagination, captcha detection, publishing confirmations, Chrome probes, etc.). Whenever you add one, extend `TimeoutConfig`, document it in the sample `timeouts:` block in `README.md`, and explain it in `docs/BROWSER_TROUBLESHOOTING.md`.
196+
- Encourage users to raise `timeouts.multiplier` when everything is slow, and override existing keys in `config.yaml` before introducing new ones. This keeps the configuration surface minimal.
197+
190198
#### Examples
191199
```python
192200
def parse_duration(text: str) -> timedelta:
@@ -297,4 +305,3 @@ See the [LICENSE.txt](LICENSE.txt) file for our project's licensing. All source
297305
- Use the translation system for all output—**never hardcode German or other languages** in the code.
298306
- If you add or change a user-facing message, update the translation file and ensure that translation completeness tests pass (`tests/unit/test_translations.py`).
299307
- Review the translation guidelines and patterns in the codebase for correct usage.
300-

README.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -277,6 +277,27 @@ categories:
277277
Verschenken & Tauschen > Verleihen: 272/274
278278
Verschenken & Tauschen > Verschenken: 272/192
279279

280+
# timeout tuning (optional)
281+
timeouts:
282+
multiplier: 1.0 # Scale all timeouts (e.g. 2.0 for slower networks)
283+
default: 5.0 # Base timeout for web_find/web_click/etc.
284+
page_load: 15.0 # Timeout for web_open page loads
285+
captcha_detection: 2.0 # Timeout for captcha iframe detection
286+
sms_verification: 4.0 # Timeout for SMS verification banners
287+
gdpr_prompt: 10.0 # Timeout when handling GDPR dialogs
288+
publishing_result: 300.0 # Timeout for publishing status checks
289+
publishing_confirmation: 20.0 # Timeout for publish confirmation redirect
290+
pagination_initial: 10.0 # Timeout for first pagination lookup
291+
pagination_follow_up: 5.0 # Timeout for subsequent pagination clicks
292+
quick_dom: 2.0 # Generic short DOM timeout (shipping dialogs, etc.)
293+
update_check: 10.0 # Timeout for GitHub update requests
294+
chrome_remote_probe: 2.0 # Timeout for local remote-debugging probes
295+
chrome_remote_debugging: 5.0 # Timeout for remote debugging API calls
296+
chrome_binary_detection: 10.0 # Timeout for chrome --version subprocess
297+
retry_enabled: true # Enables DOM retry/backoff when timeouts occur
298+
retry_max_attempts: 2
299+
retry_backoff_factor: 1.5
300+
280301
# download configuration
281302
download:
282303
include_all_matching_shipping_options: false # if true, all shipping options matching the package size will be included
@@ -329,6 +350,8 @@ login:
329350
password: ""
330351
```
331352
353+
Slow networks or sluggish remote browsers often just need a higher `timeouts.multiplier`, while truly problematic selectors can get explicit values directly under `timeouts`. Remember to regenerate the schemas after changing the configuration model so editors stay in sync.
354+
332355
### <a name="ad-config"></a>2) Ad configuration
333356

334357
Each ad is described in a separate JSON or YAML file with prefix `ad_<filename>`. The prefix is configurable in config file.

docs/BROWSER_TROUBLESHOOTING.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,18 @@ Please update your configuration to include --user-data-dir for remote debugging
5959

6060
The bot will also provide specific instructions on how to fix your configuration.
6161

62+
### Issue: Slow page loads or recurring TimeoutError
63+
64+
**Symptoms:**
65+
- `_extract_category_from_ad_page` fails intermittently due to breadcrumb lookups timing out
66+
- Captcha/SMS/GDPR prompts appear right after a timeout
67+
- Requests to GitHub's API fail sporadically with timeout errors
68+
69+
**Solutions:**
70+
1. Increase `timeouts.multiplier` in `config.yaml` (e.g. `2.0` doubles every timeout consistently).
71+
2. Override specific keys under `timeouts` (e.g. `pagination_initial: 20.0`) if only a single selector is problematic.
72+
3. Keep `retry_enabled` on so that DOM lookups are retried with exponential backoff.
73+
6274
## Common Issues and Solutions
6375

6476
### Issue 1: "Failed to connect to browser" with "root" error

schemas/config.schema.json

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,137 @@
359359
"title": "PublishingConfig",
360360
"type": "object"
361361
},
362+
"TimeoutConfig": {
363+
"properties": {
364+
"multiplier": {
365+
"default": 1.0,
366+
"description": "Global multiplier applied to all timeout values.",
367+
"minimum": 0.1,
368+
"title": "Multiplier",
369+
"type": "number"
370+
},
371+
"default": {
372+
"type": "number",
373+
"minimum": 0.0,
374+
"default": 5.0,
375+
"description": "Baseline timeout for DOM interactions.",
376+
"title": "Default"
377+
},
378+
"page_load": {
379+
"default": 15.0,
380+
"description": "Page load timeout for web_open.",
381+
"minimum": 1.0,
382+
"title": "Page Load",
383+
"type": "number"
384+
},
385+
"captcha_detection": {
386+
"default": 2.0,
387+
"description": "Timeout for captcha iframe detection.",
388+
"minimum": 0.1,
389+
"title": "Captcha Detection",
390+
"type": "number"
391+
},
392+
"sms_verification": {
393+
"default": 4.0,
394+
"description": "Timeout for SMS verification prompts.",
395+
"minimum": 0.1,
396+
"title": "Sms Verification",
397+
"type": "number"
398+
},
399+
"gdpr_prompt": {
400+
"default": 10.0,
401+
"description": "Timeout for GDPR/consent dialogs.",
402+
"minimum": 1.0,
403+
"title": "Gdpr Prompt",
404+
"type": "number"
405+
},
406+
"publishing_result": {
407+
"default": 300.0,
408+
"description": "Timeout for publishing result checks.",
409+
"minimum": 10.0,
410+
"title": "Publishing Result",
411+
"type": "number"
412+
},
413+
"publishing_confirmation": {
414+
"default": 20.0,
415+
"description": "Timeout for publish confirmation redirect.",
416+
"minimum": 1.0,
417+
"title": "Publishing Confirmation",
418+
"type": "number"
419+
},
420+
"pagination_initial": {
421+
"default": 10.0,
422+
"description": "Timeout for initial pagination lookup.",
423+
"minimum": 1.0,
424+
"title": "Pagination Initial",
425+
"type": "number"
426+
},
427+
"pagination_follow_up": {
428+
"default": 5.0,
429+
"description": "Timeout for subsequent pagination navigation.",
430+
"minimum": 1.0,
431+
"title": "Pagination Follow Up",
432+
"type": "number"
433+
},
434+
"quick_dom": {
435+
"default": 2.0,
436+
"description": "Generic short timeout for transient UI.",
437+
"minimum": 0.1,
438+
"title": "Quick Dom",
439+
"type": "number"
440+
},
441+
"update_check": {
442+
"default": 10.0,
443+
"description": "Timeout for GitHub update checks.",
444+
"minimum": 1.0,
445+
"title": "Update Check",
446+
"type": "number"
447+
},
448+
"chrome_remote_probe": {
449+
"default": 2.0,
450+
"description": "Timeout for local remote-debugging probes.",
451+
"minimum": 0.1,
452+
"title": "Chrome Remote Probe",
453+
"type": "number"
454+
},
455+
"chrome_remote_debugging": {
456+
"default": 5.0,
457+
"description": "Timeout for remote debugging API calls.",
458+
"minimum": 1.0,
459+
"title": "Chrome Remote Debugging",
460+
"type": "number"
461+
},
462+
"chrome_binary_detection": {
463+
"default": 10.0,
464+
"description": "Timeout for chrome --version subprocesses.",
465+
"minimum": 1.0,
466+
"title": "Chrome Binary Detection",
467+
"type": "number"
468+
},
469+
"retry_enabled": {
470+
"default": true,
471+
"description": "Enable built-in retry/backoff for DOM operations.",
472+
"title": "Retry Enabled",
473+
"type": "boolean"
474+
},
475+
"retry_max_attempts": {
476+
"default": 2,
477+
"description": "Max retry attempts when retry is enabled.",
478+
"minimum": 1,
479+
"title": "Retry Max Attempts",
480+
"type": "integer"
481+
},
482+
"retry_backoff_factor": {
483+
"default": 1.5,
484+
"description": "Exponential factor applied per retry attempt.",
485+
"minimum": 1.0,
486+
"title": "Retry Backoff Factor",
487+
"type": "number"
488+
}
489+
},
490+
"title": "TimeoutConfig",
491+
"type": "object"
492+
},
362493
"UpdateCheckConfig": {
363494
"description": "Configuration for update checking functionality.\n\nAttributes:\n enabled: Whether update checking is enabled.\n channel: Which release channel to check ('latest' for stable, 'preview' for prereleases).\n interval: How often to check for updates (e.g. '7d', '1d').\n If the interval is invalid, too short (<1d), or too long (>30d),\n the bot will log a warning and use a default interval for this run:\n - 1d for 'preview' channel\n - 7d for 'latest' channel\n The config file is not changed automatically; please fix your config to avoid repeated warnings.",
364495
"properties": {
@@ -428,6 +559,10 @@
428559
"update_check": {
429560
"$ref": "#/$defs/UpdateCheckConfig",
430561
"description": "Update check configuration"
562+
},
563+
"timeouts": {
564+
"$ref": "#/$defs/TimeoutConfig",
565+
"description": "Centralized timeout configuration."
431566
}
432567
},
433568
"title": "Config",

src/kleinanzeigen_bot/__init__.py

Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -573,8 +573,9 @@ def load_ad(self, ad_cfg_orig:dict[str, Any]) -> Ad:
573573

574574
async def check_and_wait_for_captcha(self, *, is_login_page:bool = True) -> None:
575575
try:
576+
captcha_timeout = self._timeout("captcha_detection")
576577
await self.web_find(By.CSS_SELECTOR,
577-
"iframe[name^='a-'][src^='https://www.google.com/recaptcha/api2/anchor?']", timeout = 2)
578+
"iframe[name^='a-'][src^='https://www.google.com/recaptcha/api2/anchor?']", timeout = captcha_timeout)
578579

579580
if not is_login_page and self.config.captcha.auto_restart:
580581
LOG.warning("Captcha recognized - auto-restart enabled, abort run...")
@@ -624,7 +625,8 @@ async def fill_login_data_and_send(self) -> None:
624625

625626
async def handle_after_login_logic(self) -> None:
626627
try:
627-
await self.web_find(By.TEXT, "Wir haben dir gerade einen 6-stelligen Code für die Telefonnummer", timeout = 4)
628+
sms_timeout = self._timeout("sms_verification")
629+
await self.web_find(By.TEXT, "Wir haben dir gerade einen 6-stelligen Code für die Telefonnummer", timeout = sms_timeout)
628630
LOG.warning("############################################")
629631
LOG.warning("# Device verification message detected. Please follow the instruction displayed in the Browser.")
630632
LOG.warning("############################################")
@@ -634,9 +636,12 @@ async def handle_after_login_logic(self) -> None:
634636

635637
try:
636638
LOG.info("Handling GDPR disclaimer...")
637-
await self.web_find(By.ID, "gdpr-banner-accept", timeout = 10)
639+
gdpr_timeout = self._timeout("gdpr_prompt")
640+
await self.web_find(By.ID, "gdpr-banner-accept", timeout = gdpr_timeout)
638641
await self.web_click(By.ID, "gdpr-banner-cmp-button")
639-
await self.web_click(By.XPATH, "//div[@id='ConsentManagementPage']//*//button//*[contains(., 'Alle ablehnen und fortfahren')]", timeout = 10)
642+
await self.web_click(By.XPATH,
643+
"//div[@id='ConsentManagementPage']//*//button//*[contains(., 'Alle ablehnen und fortfahren')]",
644+
timeout = gdpr_timeout)
640645
except TimeoutError:
641646
pass
642647

@@ -724,7 +729,8 @@ async def publish_ads(self, ad_cfgs:list[tuple[str, Ad, dict[str, Any]]]) -> Non
724729
count += 1
725730

726731
await self.publish_ad(ad_file, ad_cfg, ad_cfg_orig, published_ads, AdUpdateStrategy.REPLACE)
727-
await self.web_await(self.__check_publishing_result, timeout = 5 * 60)
732+
publish_timeout = self._timeout("publishing_result")
733+
await self.web_await(self.__check_publishing_result, timeout = publish_timeout)
728734

729735
if self.config.publishing.delete_old_ads == "AFTER_PUBLISH" and not self.keep_old_ads:
730736
await self.delete_ad(ad_cfg, published_ads, delete_old_ads_by_title = False)
@@ -924,7 +930,8 @@ async def publish_ad(self, ad_file:str, ad_cfg:Ad, ad_cfg_orig:dict[str, Any], p
924930
# wait for payment form if commercial account is used
925931
#############################
926932
try:
927-
await self.web_find(By.ID, "myftr-shppngcrt-frm", timeout = 2)
933+
short_timeout = self._timeout("quick_dom")
934+
await self.web_find(By.ID, "myftr-shppngcrt-frm", timeout = short_timeout)
928935

929936
LOG.warning("############################################")
930937
LOG.warning("# Payment form detected! Please proceed with payment.")
@@ -934,7 +941,8 @@ async def publish_ad(self, ad_file:str, ad_cfg:Ad, ad_cfg_orig:dict[str, Any], p
934941
except TimeoutError:
935942
pass
936943

937-
await self.web_await(lambda: "p-anzeige-aufgeben-bestaetigung.html?adId=" in self.page.url, timeout = 20)
944+
confirmation_timeout = self._timeout("publishing_confirmation")
945+
await self.web_await(lambda: "p-anzeige-aufgeben-bestaetigung.html?adId=" in self.page.url, timeout = confirmation_timeout)
938946

939947
# extract the ad id from the URL's query parameter
940948
current_url_query_params = urllib_parse.parse_qs(urllib_parse.urlparse(self.page.url).query)
@@ -986,7 +994,8 @@ async def update_ads(self, ad_cfgs:list[tuple[str, Ad, dict[str, Any]]]) -> None
986994
count += 1
987995

988996
await self.publish_ad(ad_file, ad_cfg, ad_cfg_orig, published_ads, AdUpdateStrategy.MODIFY)
989-
await self.web_await(self.__check_publishing_result, timeout = 5 * 60)
997+
publish_timeout = self._timeout("publishing_result")
998+
await self.web_await(self.__check_publishing_result, timeout = publish_timeout)
990999

9911000
LOG.info("############################################")
9921001
LOG.info("DONE: updated %s", pluralize("ad", count))
@@ -1080,6 +1089,7 @@ async def __set_special_attributes(self, ad_cfg:Ad) -> None:
10801089
LOG.debug("Successfully set attribute field [%s] to [%s]...", special_attribute_key, special_attribute_value_str)
10811090

10821091
async def __set_shipping(self, ad_cfg:Ad, mode:AdUpdateStrategy = AdUpdateStrategy.REPLACE) -> None:
1092+
short_timeout = self._timeout("quick_dom")
10831093
if ad_cfg.shipping_type == "PICKUP":
10841094
try:
10851095
await self.web_click(By.ID, "radio-pickup")
@@ -1091,7 +1101,7 @@ async def __set_shipping(self, ad_cfg:Ad, mode:AdUpdateStrategy = AdUpdateStrate
10911101
if mode == AdUpdateStrategy.MODIFY:
10921102
try:
10931103
# when "Andere Versandmethoden" is not available, go back and start over new
1094-
await self.web_find(By.XPATH, '//dialog//button[contains(., "Andere Versandmethoden")]', timeout = 2)
1104+
await self.web_find(By.XPATH, '//dialog//button[contains(., "Andere Versandmethoden")]', timeout = short_timeout)
10951105
except TimeoutError:
10961106
await self.web_click(By.XPATH, '//dialog//button[contains(., "Zurück")]')
10971107

@@ -1120,7 +1130,7 @@ async def __set_shipping(self, ad_cfg:Ad, mode:AdUpdateStrategy = AdUpdateStrate
11201130
# (important for mode = UPDATE)
11211131
await self.web_find(By.XPATH,
11221132
'//input[contains(@placeholder, "Versandkosten (optional)")]',
1123-
timeout = 2)
1133+
timeout = short_timeout)
11241134
except TimeoutError:
11251135
await self.web_click(By.XPATH, '//*[contains(@id, "INDIVIDUAL") and contains(@data-testid, "Individueller Versand")]')
11261136

0 commit comments

Comments
 (0)