Skip to content

Conversation

@butschster
Copy link
Collaborator

@butschster butschster commented Mar 11, 2025

This PR adds a powerful content sanitization system and a PHP-to-documentation transformer to the context generator.

1. Content Sanitization System

Introduces a comprehensive system for sanitizing and transforming content:

CommentInsertionRule: Adds structured comments to code blocks

{
  "documents": [{
    "description": "Sanitized API Code",
    "outputPath": "docs/sanitized-api.md",
    "sources": [{
      "type": "file",
      "sourcePaths": ["src/Api"],
      "modifiers": [{
        "name": "sanitizer",
        "options": {
          "rules": [{
            "type": "comment",
            "fileHeaderComment": "WARNING: This file contains sensitive operations",
            "methodComment": "Security critical method",
            "frequency": 0
          }]
        }
      }]
    }]
  }]
}

KeywordRemovalRule: Removes sensitive content based on keywords

{
  "documents": [{
    "description": "Redacted Documentation",
    "outputPath": "docs/redacted.md",
    "sources": [{
      "type": "file",
      "sourcePaths": ["config"],
      "modifiers": [{
        "name": "sanitizer",
        "options": {
          "rules": [{
            "type": "keyword",
            "keywords": ["API_KEY", "SECRET", "PASSWORD"],
            "replacement": "[REDACTED]",
            "caseSensitive": false,
            "removeLines": true
          }]
        }
      }]
    }]
  }]
}

RegexReplacementRule: Pattern-based text transformations

{
  "documents": [{
    "description": "Anonymized Content",
    "outputPath": "docs/anonymized.md",
    "sources": [{
      "type": "file",
      "sourcePaths": ["src/User"],
      "modifiers": [{
        "name": "sanitizer",
        "options": {
          "rules": [{
            "type": "regex",
            "patterns": {
              "/[\\w.+-]+@[\\w-]+\\.[\\w.-]+/": "[EMAIL]",
              "/\\d{3}-\\d{2}-\\d{4}/": "[SSN]" 
            }
          }]
        }
      }]
    }]
  }]
}

2. AST Documentation Transformer

Added AstDocTransformer that converts PHP code into structured markdown documentation:

  • Parses PHP code using Abstract Syntax Tree (AST)
  • Generates clean, readable markdown documentation
  • Preserves method implementations in code blocks
  • Extracts annotations like route information
  • Supports classes, interfaces, traits, and enums

Configuration Example:

{
  "documents": [{
    "description": "API Documentation",
    "outputPath": "docs/api-docs.md",
    "sources": [{
      "type": "file",
      "description": "Controllers",
      "sourcePaths": ["src/Controller"],
      "modifiers": [{
        "name": "php-docs",
        "options": {
          "include_private_methods": false,
          "include_protected_methods": true,
          "extract_routes": true,
          "include_implementations": true,
          "class_heading_level": 1,
          "method_heading_format": "### {name}",
          "code_block_format": "php"
        }
      }]
    }]
  }]
}

Combined Usage Example:

{
  "documents": [{
    "description": "Clean API Documentation",
    "outputPath": "docs/clean-api-docs.md",
    "sources": [{
      "type": "file",
      "description": "API Controllers",
      "sourcePaths": ["src/Controller"],
      "modifiers": [
        {
          "name": "php-docs",
          "options": {
            "include_private_methods": false,
            "extract_routes": true
          }
        },
        {
          "name": "context-sanitizer",
          "options": {
            "rules": [{
              "type": "keyword",
              "keywords": ["SECRET", "INTERNAL"]
            }]
          }
        }
      ]
    }]
  }]
}

This transformer is ideal for:

  • Preparing code for LLM context
  • Generating API documentation
  • Creating code overviews for team discussions

various rule types for content manipulation.

This includes:
- ContextSanitizer with modular rule interface
- AstDocTransformer for code-to-documentation transformation
- Various sanitization rules (CommentInsertion, KeywordRemoval, RegexReplacement)
@butschster butschster merged commit 61a8b53 into main Mar 11, 2025
5 of 8 checks passed
@butschster butschster deleted the feature/modifiers branch March 11, 2025 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants