Conversation
…xt to pyproject.toml - Move Python source files from `src/` to `crawler_to_md/` - Rename `main.py` to `crawler_to_md/cli.py` and update internal imports - Update all imports in codebase to reflect new module layout - Remove requirements.txt in favor of a PEP 621-compliant pyproject.toml with dependencies and CLI entrypoint - Update .gitignore for Python/package build artifacts (.egg-info, .pytest_cache, .ruff_cache) - Revise README to reflect new install and usage instructions - Add initial test file under tests/ - Adjust code formatting and docstrings for improved clarity This commit prepares the package for proper pip installation and CLI invocation as `crawler-to-md`.
…just dependencies - Switch to dynamic versioning with setuptools_scm - Update project description and author information - Replace trafilatura with markitdown in dependencies - Add setuptools_scm to build requirements - Add tool.setuptools_scm configuration
…estPyPI Adds a workflow that builds Python packages on release and publishes to PyPI. Includes optional publishing to TestPyPI when running on the main branch.
Replaced imports from src.* to crawler_to_md.* throughout test files for consistency with package structure. Adjusted whitespace and minor variable naming for code clarity and style. Added assertions for non-None results in scraper tests. No changes to test logic.
- Removed unused trafilatura import and duplicate imports - Organized import statements - Adjusted whitespace and formatting for improved readability in Scraper class
Removes redundant Scraper instantiation and adjusts error handling to use argparse's parser.error when no URL is provided.
…ency management - Switch to multi-stage build with clear base, builder, and final stages - Replace pip with uv for Python package installation - Implement non-root user for improved security - Improve caching and layer ordering for build efficiency - Pass APP_VERSION via environment for setuptools-scm compatibility - Set leaner final entrypoint using uv-installed binary - Remove unused/obsolete instructions related to pip and Python path
…t path Changes default --cache-folder value to ~/.cache/crawler-to-md and ensures user paths are expanded using os.path.expanduser.
…ot user - Change cache directory from /app/cache to /home/app/.cache/crawler-to-md - Ensure directory is owned by the app user and update VOLUME accordingly - Create home directory for app user with -m flag
|
Caution Review failedThe pull request is closed. WalkthroughThis update introduces a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI (crawler-to-md)
participant DatabaseManager
participant Scraper
participant ExportManager
User->>CLI (crawler-to-md): Run with URLs and options
CLI (crawler-to-md)->>DatabaseManager: Initialize with db_path
CLI (crawler-to-md)->>Scraper: Start scraping with URLs
Scraper->>DatabaseManager: Insert links and pages
Scraper->>ExportManager: Export markdown files
ExportManager->>User: Provide output file paths
sequenceDiagram
participant GitHub
participant GitHub Actions
participant PyPI
participant TestPyPI
GitHub->>GitHub Actions: Publish Release Event
GitHub Actions->>GitHub Actions: Build and package Python project
GitHub Actions->>PyPI: Publish package (using secrets)
alt On main branch
GitHub Actions->>TestPyPI: Publish package (using TestPyPI secrets)
end
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (16)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Refactor
Chores