-
Notifications
You must be signed in to change notification settings - Fork 63
fix: eliminate async safety violations and migrate to pathlib #697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Eliminate all blocking I/O operations in async contexts and modernize file path handling by migrating from os.path to pathlib.Path. BREAKING CHANGE: None - internal refactoring only - Enable ASYNC210, ASYNC230, ASYNC240, ASYNC250 Ruff rules - Wrap blocking urllib.request.urlopen() in run_in_executor - Wrap blocking file operations (open, write) in run_in_executor - Replace blocking os.path calls with async helpers using run_in_executor - Replace blocking input() with await ainput() - Migrate extract.py from os.path to pathlib.Path - Use Path() constructor and / operator for path joining - Use Path.mkdir(), Path.rename() in executor instead of os functions - Create mockable _path_exists() and _path_is_dir() helpers - Add debug logging for all file system operations
WalkthroughReplaced a blocking console input with an async prompt; introduced Path-based and async-friendly filesystem helpers; offloaded blocking network and file I/O to executors with a new sync image-download helper; updated browser-session prefs and port checks to use async/executor semantics; and revised unit tests to use Path and async-friendly flows. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Init as __init__.py
participant Extract as AdExtractor
participant Executor as Event Loop Executor
participant Disk as Filesystem
participant Net as Network
App->>Init: publish_ad()
activate Init
Init->>App: await ainput()
note right of Init `#DFF2E0`: Non-blocking user prompt
Init-->>App: payment response
deactivate Init
App->>Extract: _extract_ad_page_info_with_directory_handling(Path, ad_id)
activate Extract
Extract->>Extract: await _exists/_isdir
rect rgb(235,245,255)
loop for each image
Extract->>Executor: run_in_executor(_download_and_save_image_sync, url, dir, prefix, nr)
Executor->>Net: HTTP GET image
Net-->>Executor: image bytes / error
Executor->>Disk: write file (sync)
Disk-->>Executor: saved path / error
Executor-->>Extract: saved path or None
end
end
Extract->>Executor: run_in_executor(mkdir/rename/rmtree)
Executor->>Disk: perform filesystem ops
Disk-->>Executor: result
Executor-->>Extract: done
Extract-->>App: return (AdPartial, Path)
deactivate Extract
Estimated code review effort🎯 4 (Complex) | ⏱️ ~30–90 minutes
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (3)
🧰 Additional context used📓 Path-based instructions (2)src/kleinanzeigen_bot/**/*.py⚙️ CodeRabbit configuration file
Files:
tests/**/*.py⚙️ CodeRabbit configuration file
Files:
🧬 Code graph analysis (2)src/kleinanzeigen_bot/extract.py (2)
src/kleinanzeigen_bot/__init__.py (1)
🪛 Ruff (0.14.5)src/kleinanzeigen_bot/extract.py74-74: Unused Remove unused (RUF100) tests/unit/test_extract.py74-74: Unused Remove unused (RUF100) 90-90: Unused Remove unused (RUF100) 112-112: Unused Remove unused (RUF100) 128-128: Unused Remove unused (RUF100) 149-149: Unused Remove unused (RUF100) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
🔇 Additional comments (6)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #697 +/- ##
==========================================
+ Coverage 84.24% 86.40% +2.15%
==========================================
Files 20 20
Lines 2019 2052 +33
Branches 393 393
==========================================
+ Hits 1701 1773 +72
+ Misses 182 142 -40
- Partials 136 137 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/test_extract.py (1)
970-1024: Reduce implementation‑specific assertions indownload_adtestsThe updated
download_adtests correctly mock_path_exists/_path_is_dirand_extract_ad_page_info_with_directory_handling, and they already assert on the computed YAML path and the dict passed tosave_dict(the observable behavior that matters).The additional assertions around
Path.mkdir/Path.rename/shutil.rmtreecall counts are tightly coupled to the current internal implementation and will make future refactors of directory handling harder (especially since directory logic is now explicitly pushed down into_extract_ad_page_info_with_directory_handling).Consider simplifying these tests by:
- Keeping the assertions on
mock_extract_with_dirandmock_save_dict(behavioral contract).- Dropping or greatly relaxing the direct
assert_not_called/call_countchecks onmock_mkdir,mock_rename, andmock_rmtree, unless you specifically want to pin down this internal behavior.That keeps the tests aligned with “test observable behavior, not implementation” and makes them more robust to future path‑handling refactors.
Also applies to: 1025-1078, 1080-1132, 1134-1191
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (2)
pyproject.tomlis excluded by none and included by nonesrc/kleinanzeigen_bot/resources/translations.de.yamlis excluded by none and included by none
📒 Files selected for processing (6)
src/kleinanzeigen_bot/__init__.py(2 hunks)src/kleinanzeigen_bot/extract.py(7 hunks)src/kleinanzeigen_bot/utils/web_scraping_mixin.py(3 hunks)tests/unit/test_extract.py(10 hunks)tests/unit/test_init.py(1 hunks)tests/unit/test_web_scraping_mixin.py(6 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py
⚙️ CodeRabbit configuration file
tests/**/*.py: TESTING RULES:
- NEVER access live website in tests (bot detection risk)
- Use @patch for web operations in tests
- Use test fixtures for browser automation
- Test Pydantic models without web scraping
- Mock all web operations in tests
- Use pytest markers: unit, integration, smoke
- Unit tests: fast, isolated, no external dependencies
- Integration tests: use mocks, test with external dependencies
- Smoke tests: critical path, no mocks, no browser (NOT E2E tests)
- All test code must be in English
- Test observable behavior, not implementation
- Use fakes/dummies instead of mocks in smoke tests
- Focus on minimal health checks, not full user workflows
- Include SPDX license headers
- Use descriptive test names in English
Files:
tests/unit/test_web_scraping_mixin.pytests/unit/test_extract.pytests/unit/test_init.py
src/kleinanzeigen_bot/**/*.py
⚙️ CodeRabbit configuration file
src/kleinanzeigen_bot/**/*.py: CRITICAL RULES FOR KLEINANZEIGEN BOT:
- ALL code, comments, and text MUST be in English
- User-facing messages MUST use translation system (_()) function
- NEVER access live website in tests (bot detection risk)
- Use WebScrapingMixin for browser automation
- Handle TimeoutError for all web operations
- Use ensure() for critical validations
- Don't add features until explicitly needed
- Keep solutions simple and straightforward
- Use async/await for I/O operations
- Follow Pydantic model patterns
- Use proper error handling and logging
- Test business logic separately from web scraping
- Include SPDX license headers on all Python files
- Use type hints for all function parameters and return values
- Use structured logging with context
Files:
src/kleinanzeigen_bot/__init__.pysrc/kleinanzeigen_bot/utils/web_scraping_mixin.pysrc/kleinanzeigen_bot/extract.py
🧬 Code graph analysis (4)
tests/unit/test_extract.py (1)
src/kleinanzeigen_bot/extract.py (6)
_path_exists(30-32)_path_is_dir(35-37)_exists(40-43)_isdir(46-49)AdExtractor(52-674)_download_and_save_image_sync(91-102)
src/kleinanzeigen_bot/__init__.py (1)
src/kleinanzeigen_bot/utils/misc.py (1)
ainput(121-122)
src/kleinanzeigen_bot/utils/web_scraping_mixin.py (2)
src/kleinanzeigen_bot/extract.py (1)
_exists(40-43)src/kleinanzeigen_bot/utils/misc.py (1)
ensure(19-49)
src/kleinanzeigen_bot/extract.py (1)
src/kleinanzeigen_bot/utils/web_scraping_mixin.py (1)
_exists(134-135)
🪛 Ruff (0.14.5)
tests/unit/test_web_scraping_mixin.py
101-101: Unused noqa directive (non-enabled: PLC0415, PLC2701)
Remove unused noqa directive
(RUF100)
tests/unit/test_extract.py
69-69: Dynamically typed expressions (typing.Any) are disallowed in tmp_path
(ANN401)
72-72: Unused noqa directive (non-enabled: PLC0415, PLC2701)
Remove unused noqa directive
(RUF100)
85-85: Dynamically typed expressions (typing.Any) are disallowed in tmp_path
(ANN401)
88-88: Unused noqa directive (non-enabled: PLC0415, PLC2701)
Remove unused noqa directive
(RUF100)
108-108: Dynamically typed expressions (typing.Any) are disallowed in tmp_path
(ANN401)
110-110: Unused noqa directive (non-enabled: PLC0415, PLC2701)
Remove unused noqa directive
(RUF100)
124-124: Dynamically typed expressions (typing.Any) are disallowed in tmp_path
(ANN401)
126-126: Unused noqa directive (non-enabled: PLC0415, PLC2701)
Remove unused noqa directive
(RUF100)
145-145: Dynamically typed expressions (typing.Any) are disallowed in tmp_path
(ANN401)
147-147: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
173-173: Dynamically typed expressions (typing.Any) are disallowed in tmp_path
(ANN401)
src/kleinanzeigen_bot/__init__.py
940-940: Unused noqa directive (non-enabled: ASYNC240)
Remove unused noqa directive
(RUF100)
src/kleinanzeigen_bot/extract.py
41-41: Unused noqa directive (non-enabled: ASYNC240)
Remove unused noqa directive
(RUF100)
47-47: Unused noqa directive (non-enabled: ASYNC240)
Remove unused noqa directive
(RUF100)
100-100: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: build (ubuntu-latest, 3.14, true)
- GitHub Check: build (ubuntu-latest, 3.10, false)
- GitHub Check: build (macos-15-intel, 3.14, true)
- GitHub Check: build (windows-latest, 3.10, false)
- GitHub Check: build (windows-latest, 3.14, true)
- GitHub Check: build (macos-15-intel, 3.10, false)
- GitHub Check: build (macos-latest, 3.10, false)
🔇 Additional comments (3)
tests/unit/test_init.py (1)
641-652: Path-based config write intest_verify_commandlooks goodUsing
Path(tmp_path) / "config.yaml"withwrite_text(..., encoding="utf-8")is clear, encoding-safe, and aligns with the move toward pathlib-based IO. No issues found.src/kleinanzeigen_bot/__init__.py (1)
1108-1112: Formatting-only change in__set_shippingis fineThe wrapped
web_findcall keeps the same arguments and control flow; no behavioral change or async-safety concern here.tests/unit/test_web_scraping_mixin.py (1)
10-10: Path/async adjustments in browser prefs and extensions tests look good
- Adding
import asyncioand usingrun_in_executorintest_browser_extension_loadingto checkPath(ext_path).exists/.is_dirkeeps those filesystem checks off the event loop and aligns with the async-safety pattern used in the main code.- Switching to
Path.write_text/read_textwith explicit encodings for the preferences and session state files (prefs_fileandstate_file) simplifies the tests and mirrors the pathlib-based IO in the src modules.These changes are straightforward, English-only, and keep the tests simple and maintainable.
Also applies to: 790-812, 919-921, 1001-1001, 1013-1014
2acb8f4 to
0d84639
Compare
Coverage improvements: - Add tests for image download with None URLs (lines 124-136) - Add tests for directory handling scenarios (lines 420-454) - Test deletion when final_dir exists - Test renaming when rename_existing_folders=True - Test using existing directory when renaming disabled - Simplify download_ad tests to focus on observable behavior - Improve type annotations (tmp_path: Path instead of Any) - Coverage increased to 87.70% Async safety improvements: - Remove unused # noqa: ASYNC240 from _exists, _isdir, ainput - Eliminate TOCTOU race in directory creation using mkdir(exist_ok=True) - Offload dicts.save_dict to executor (blocking I/O) Path handling improvements: - Migrate to pathlib.Path for OS-agnostic file operations - Replace string concatenation with Path operations - Use Path.name instead of rsplit for filename extraction Exception handling improvements: - Narrow exception handling in _download_and_save_image_sync - Catch only expected errors (URLError, HTTPError, OSError, shutil.Error) - Don't swallow programming errors Translation updates: - Update German translation for directory creation message
0d84639 to
ac12b3d
Compare
ℹ️ Description
Eliminate all blocking I/O operations in async contexts and modernize file path handling by migrating from os.path to pathlib.Path.
📋 Changes Summary
⚙️ Type of Change
Select the type(s) of change(s) included in this pull request:
✅ Checklist
Before requesting a review, confirm the following:
pdm run test).pdm run format).pdm run lint).By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Summary by CodeRabbit
Refactor
Tests
✏️ Tip: You can customize this high-level summary in your review settings.