Skip to content

Releases: flamehaven01/Dir2md

[1.2.2] - 2026-01-06

06 Jan 04:23

Choose a tag to compare

Security & Compatibility Patch

This release hardens symlink handling, preserves secret masking for large files, and removes pathspec deprecation warnings.

Fixed

  • CRITICAL: Prevented symlinked directory traversal from escaping root when --follow-symlinks is enabled
  • CRITICAL: Large inputs are now masked in chunks instead of bypassing masking entirely
  • COMPAT: PathSpec matching uses gitignore engine to avoid deprecation warnings

Changed

  • Updated internal version markers to 1.2.2

[1.2.1] - 2025-12-18

18 Dec 07:45

Choose a tag to compare

Security & Reliability Patch

This release addresses critical security and reliability issues identified in the SIDRCE Spicy Audit.

Fixed

CRITICAL: Markdown Fence Injection (Issue #5)

  • Problem: File content containing triple backticks (```) could break markdown output and potentially inject misleading content
  • Solution: Implemented dynamic fence escaping that counts consecutive backticks in content and uses N+1 backticks for outer fence
  • Impact: Prevents markdown structure corruption and potential injection attacks
  • Files: src/dir2md/markdown.py

HIGH: Subprocess RCE Vector (Issue #1)

  • Problem: vulture subprocess call created RCE vector if malicious binary in PATH + unreliable dependency on external tool
  • Solution: Removed subprocess.run() call entirely; documented future approach using AST or library API
  • Impact: Eliminates security risk and removes unreliable external dependency
  • Files: src/dir2md/spicy.py

MEDIUM: Silent Exception Failures (Issue #2)

  • Problem: try-except-pass blocks silently ignored .env loading failures, making debugging impossible
  • Solution: Replaced bare except Exception: pass with specific exception handling (OSError, UnicodeDecodeError) and logging.warning()
  • Impact: Users now see warnings when configuration loading fails
  • Files: src/dir2md/cli.py

LOW: Aggressive Glob Expansion (Issue #3)

  • Problem: Auto-expansion of patterns (e.g., foo/[foo/, foo/**, **/foo, **/foo/**]) violated user intent and caused performance issues in large repos
  • Solution: Removed automatic expansion; now respects gitignore standard (foo/ means foo/, **/foo for recursive)
  • Impact: Better performance, predictable behavior, respects principle of least surprise
  • Files: src/dir2md/walker.py

LOW: Hardcoded DEFAULT_EXCLUDES (Issue #4)

  • Problem: 20+ hardcoded exclude patterns with personal preferences (.pytest_cache_local) and no easy way to customize
  • Solution: Moved to external defaults.json file, removed personal preferences, added graceful fallback
  • Impact: Easier maintenance, user can modify excludes by editing JSON file, cleaner codebase
  • Files: src/dir2md/cli.py, src/dir2md/defaults.json (new)

Added

  • Configuration File: src/dir2md/defaults.json - External default exclusion patterns
  • Package Data: Configured pyproject.toml to include defaults.json in distribution
  • Priority System: Three-tier exclusion pattern priority (user > project > system)
    • System defaults: defaults.json or custom file via --defaults-file
    • Project config: pyproject.toml [tool.dir2md] excludes = [...]
    • User CLI: --exclude-glob (highest priority)
  • CLI Argument: --defaults-file - Specify custom defaults.json path
  • pyproject.toml Support: [tool.dir2md.excludes] - Project-level default excludes

Removed

  • External tool dependency: vulture subprocess execution (security risk)
  • Unused imports: subprocess, shutil from spicy.py
  • Personal preferences: .pytest_cache_local from default excludes
  • Aggressive glob expansion: Non-glob patterns no longer auto-expanded

Changed

  • Glob pattern handling now respects user intent per gitignore standard
  • Default excludes now loaded from JSON file with graceful fallback
  • Exclusion pattern system redesigned with three-tier priority (backwards compatible)

Notes

  • All fixes maintain backward compatibility
  • No breaking changes to CLI interface
  • Users who relied on auto-glob expansion should explicitly use **/pattern for recursive matching
  • Files containing ``` now render correctly (markdown fence escaping)

Usage Examples

Priority System:

# System defaults only (defaults.json)
dir2md .

# Custom system defaults
dir2md . --defaults-file my-defaults.json

# Project + system defaults (pyproject.toml overrides system)
# In pyproject.toml:
# [tool.dir2md]
# excludes = ["*.log", "temp/"]

# User overrides everything (highest priority)
dir2md . --exclude-glob "secret-data/"

pyproject.toml Configuration:

[tool.dir2md]
excludes = [
    "*.log",
    "temp/",
    "cache/",
    "*.tmp"
]
# These patterns will be added to system defaults
# User CLI args (--exclude-glob) take precedence over these

[1.2.0] - 2025-12-15

15 Dec 07:59

Choose a tag to compare

Philosophy: Intelligence Without Complexity

This release removes configuration overhead while adding sophisticated optimizations. All features auto-activate based on preset choice - zero flags, maximum intelligence.

Added

Phase 1: Gravitas Compression (SAIQL-Inspired)

  • Symbolic Compression: Unicode symbol substitution reduces tokens by 30-50%
    • Basic level: Common metadata patterns (File:§, Lines:)
    • Medium level: + File type symbols (.py🐍, .js📜)
    • Full level: + Code patterns (functionƒ, class©)
  • Auto-Activation: pro preset = basic, ai preset = medium
  • Stats Reporting: Compression statistics embedded as HTML comments in output

Phase 2: Smart Query Processing

  • Typo Auto-Correction: Levenshtein distance-based correction
    • 80+ programming term dictionary (auth, payment, database, API, etc.)
    • Automatic suggestions: "atuh" → "auth", "databse" → "database"
    • Zero-dependency implementation
  • Query Expansion: Pattern-based synonym expansion
    • 50+ domain-specific patterns
    • Accuracy improvement: 60% → 90%
    • Example: "auth" → "auth OR login OR signin OR session OR token"
  • Query Suggestions: File-based keyword extraction
    • Analyzes matched files for related terms
    • Directory-grouped suggestions
    • Query history tracking
  • Auto-Activation: Enabled when --query provided

Phase 3: AST Semantic Sampling

  • Python Structure Extraction: Intelligent code sampling using AST analysis
    • Priority-based extraction: Classes > Functions > Implementation
    • Preserves: Class definitions, function signatures, docstrings
    • Reduces: Implementation details, private methods
    • Additional 30-40% token reduction
  • NodePriority System:
    • CRITICAL: Public classes, main/entry functions
    • HIGH: Public functions, class methods
    • MEDIUM: Private functions
    • LOW: Implementation details
  • Auto-Activation: Enabled for .py files in pro/ai presets
  • Fallback: Gracefully handles unparseable files

Changed

Radical Simplification

  • REMOVED: --gravitas flag (now preset-based)
  • REMOVED: --expand flag (now auto-enabled with queries)
  • Preset Behaviors:
    • raw: No optimizations (pure original)
    • fast: No optimizations (minimal metadata)
    • pro: gravitas=basic + query expansion + AST sampling
    • ai: gravitas=medium + query expansion + AST sampling

Architecture

  • New Modules:
    • src/dir2md/query/corrector.py (180 lines) - Typo correction engine
    • src/dir2md/query/suggester.py (180 lines) - Query suggestion engine
    • src/dir2md/samplers/semantic.py (320 lines) - AST semantic sampler
  • Pipeline Integration: Semantic sampling integrated into selector.py file processing
  • Zero Dependencies: All features implemented without external LLM or NLP libraries

Performance

Combined Optimizations

  • Token Reduction: Up to 60-70% total savings
    • Gravitas: 30-50% (preset-dependent)
    • AST Sampling: 30-40% (for Python files)
    • Cumulative effect on Python codebases
  • Query Accuracy: 60% → 90% (pattern expansion)
  • User Experience: 2 flags instead of 7 for common use cases

Documentation

  • README: Completely rewritten to emphasize zero-configuration intelligence
  • Examples: Simplified from 7-flag commands to 2-flag commands
  • Migration Guide: Clear before/after examples for v1.1.3 → v1.2.0

Testing

  • Phase 1: 4/4 preset configurations validated
  • Phase 2: Typo correction tested with 7+ common mistakes
  • Phase 3: AST sampling validated on real Python modules (31% reduction achieved)
  • Integration: All phases tested together in production scenarios

Breaking Changes

None - All changes are additive or simplify existing behavior. Users on raw preset see no changes.

[1.1.2] - 2025-12-09

09 Dec 08:20

Choose a tag to compare

Security

  • Masking now pre-compiles basic/advanced regexes and skips processing when input exceeds a safe threshold to reduce ReDoS risk.
  • Large individual files are skipped before read when they exceed 1MB, preventing OOM/hangs while still noting the skip.

Performance

  • Token estimation is cached with LRU (maxsize 2048) and keeps a minimum of one token for empty strings.

Behavior/UX

  • Skipped oversized files now record placeholder hash/summary so the skip is visible in outputs and manifests.
  • Custom masking patterns are compiled before use; invalid patterns emit warnings and are ignored.

Tests

  • Pytest suite: 22 passed, 2 skipped.

[1.1.1] - 2025-12-04

04 Dec 12:50

Choose a tag to compare

[1.1.1] - 2025-12-04

Removed

  • Pro/license gating entirely: deleted license.py, removed license checks from masking/parallel/CLI/tests; .env.example no longer includes license keys.
  • Blueprint workflow retired; CI now runs ruff + pytest only.

Fixed

  • HF demo import path includes src so dir2md imports resolve in Spaces.
  • Lint/test cleanup after refactor (ruff + pytest green).

Documentation

  • Updated README/demo metadata (Spicy branding, HF front matter); pending: scrub remaining Pro/license mentions in docs.

[1.1.0] — 2025-12-02

03 Dec 10:12

Choose a tag to compare

AI-Friendly Release · Spicy Risk Engine · Modular Architecture

Added

AI-Friendly Query & Output

  • --query filters/sorts by match score and injects contextual snippets.
  • --output-format md|json|jsonl supports human readers and CLI/LLM pipelines.

New Presets & Modes

  • --ai-mode: reference-mode defaults, capped budgets, stats + manifest enabled.
  • --fast: tree + manifest only (skips file content reads).

Spicy Risk Reporting

  • --spicy: severity counts, scores, and findings across all output formats.
  • --spicy-strict: exits with non-zero status when high/critical findings are detected.

CLI Enhancements

  • Unified [INFO] status messages.
  • --progress verbosity selector.
  • Added execution plan summary line.

Safety & Performance Hardening

  • Symlink escape guard.
  • Streaming file reads with full hashing.
  • Manifest reuse for repeated runs.

Packaging & Distribution

  • Default output: Markdown + JSONL (human + LLM workflows).
  • Added GitHub Actions Release workflow (PyPI + TestPyPI).
  • Updated Dockerfile to install the package automatically.

Architecture Decoupling

Introduced new modules to avoid a single god-object:

  • walker.py
  • selector.py
  • renderer.py
  • orchestrator.py

Test Suite Expansion

  • Added coverage for masking, search, token logic, spicy mode, JSONL output, CLI defaults.
  • Added symlink and streaming hash tests.
  • 22 tests passed, 2 skipped.

Changed

  • Default preset remains pro.
  • Removed iceberg references from documentation.

Fixed

  • --fast now properly skips content reads.
  • Plan summary reflects effective post-preset settings.
  • Fixed NameError in --include-glob / --exclude-glob pathspec compilation.

Release v1.0.4

22 Nov 03:42

Choose a tag to compare

Release Notes — v1.0.4

2025-10-09

🚀 New Features

Enhanced Security Masking

Expanded default masking coverage to include:

  • GitHub Personal Access Tokens: ghp_*, gho_*, ghu_*, ghs_*, ghr_*
  • Generic API Keys: api_key=, apikey=, api-key=
  • Database connection strings (PostgreSQL, MySQL, MongoDB)
  • JSON Web Tokens (JWTs): Base64 tokens beginning with eyJ
  • OAuth client secrets: client_secret=, oauth_secret=

Automatic Environment Configuration

  • Automatically loads the nearest .env file at CLI startup
  • Enables zero-configuration workflows for teams using .env files
  • Supports seamless Pro license activation through environment variables

User-Defined Masking Patterns

Three flexible pattern configuration methods:

  1. Inline: --mask-pattern "regex"
  2. Pattern files: JSON arrays or newline-delimited text (file:// URIs supported)
  3. Project configuration via pyproject.toml:
    [tool.dir2md.masking]
    patterns = ["regex1", "regex2"]

⚙️ Improvements

Expanded Default Exclusion Rules

Automatically excludes common secret-bearing files:

  • .env, .env.local, .env.*.local
  • *.pem, *.key, *.p12, *.pfx, *.crt, *.cer, *.der

Windows Compatibility Enhancements

  • Documentation updated to ASCII-only for Windows cp949 compatibility
  • Prevents illegal multibyte sequence errors
  • Replaced emojis with ASCII-safe symbols ([#], [!], [+], [*], [T])

Documentation Updates

  • Clarified relationship with IsaacBreen’s original dir2md
  • Added Acknowledgments section
  • Updated installation instructions for GitHub-only distribution
  • Removed outdated PyPI “coming soon” references

🛠️ Fixes

[CRITICAL] Windows file:// URI Path Parsing

  • Fixed incorrect parsing of file:///C:/... resulting in invalid \C:... paths
  • Correctly strips leading slash for Windows absolute paths
  • Custom mask-pattern files now load properly on Windows

[SECURITY] GitHub PAT Masking Tier Correction

  • GitHub PAT regex was incorrectly available in basic masking
  • Moved to advanced (Pro) masking rules
  • Restores proper separation between OSS and Pro feature tiers

Advanced Masking Notice Deduplication

  • Advanced-mode upgrade notice now prints only once per CLI session

🧪 Testing & Verification

Automated Tests

  • 12 tests passing (1 skipped on systems without symlink support)
  • Full pytest suite coverage for core functionality
  • Regression tests for Windows file:// path handling

Manual Verification

  • Custom masking patterns load correctly from JSON and text files
  • Windows file:// URIs resolve properly
  • Basic vs. advanced masking separation verified
  • .env auto-discovery validated across directory structures
  • ASCII-safe documentation confirmed to render correctly on Windows cp949