Add include URL CLI filter#63
Conversation
WalkthroughA new CLI option Changes
Sequence DiagramsequenceDiagram
actor User
participant CLI as CLI Parser
participant Scraper
participant Validator as is_valid_link
User->>CLI: --include-url /docs --include-url /api
CLI->>CLI: Parse & collect patterns
CLI->>Scraper: Initialize with include_url_patterns=["/docs", "/api"]
Scraper->>Scraper: Store include_url_patterns
loop During scraping
Scraper->>Validator: is_valid_link(url)
alt include_url_patterns provided
Validator->>Validator: Check if url contains any pattern
Validator-->>Scraper: Valid if match found
else include_url_patterns empty
Validator-->>Scraper: Valid (no restriction)
end
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (4)
🧰 Additional context used🧬 Code graph analysis (2)tests/test_cli.py (2)
tests/test_scraper.py (1)
🪛 Ruff (0.14.5)tests/test_cli.py61-61: Unused function argument: (ARG001) 106-106: Unused function argument: (ARG001) 151-151: Unused function argument: (ARG001) 225-225: Unused function argument: (ARG001) 282-282: Unused function argument: (ARG001) 336-336: Unused function argument: (ARG001) 337-337: Unused function argument: (ARG001) 338-338: Unused function argument: (ARG001) 340-340: Unused function argument: (ARG001) 341-341: Unused function argument: (ARG001) 342-342: Unused function argument: (ARG001) 343-343: Unused function argument: (ARG001) 344-344: Unused function argument: (ARG001) 345-345: Unused function argument: (ARG001) 356-356: Unused lambda argument: (ARG005) 356-356: Unused lambda argument: (ARG005) 357-357: Unused lambda argument: (ARG005) 357-357: Unused lambda argument: (ARG005) 358-358: Unused lambda argument: (ARG005) 358-358: Unused lambda argument: (ARG005) tests/test_scraper.py311-311: Unused function argument: (ARG001) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (11)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Testing
Codex Task
Summary by CodeRabbit
Release Notes
--include-url(-I) command-line option to filter crawling to only URLs containing specified strings. This option can be specified multiple times for multiple inclusion patterns, providing complementary filtering alongside existing exclusion options.