Skip to content

AI-powered documentation assistant that auto-generates and updates project docs. Integrates with OpenAI to keep documentation in sync with code changes.

License

Notifications You must be signed in to change notification settings

hans992/Proactive_Documentation_Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Proactive Documentation Assistant

Overview

This AI agent proactively monitors a specified Git repository for new commits. When changes are detected, it analyzes the modified Python code and existing documentation (specifically the README file) to suggest updates for docstrings and the README itself. The goal is to help keep documentation consistent and up-to-date with code evolution.

The agent uses:

  • GitPython to interact with the Git repository.
  • Python's ast module to parse and understand the structure of Python code.
  • litellm to interact with a Large Language Model (LLM) like GPT-4o or Claude for generating documentation suggestions.
  • JSON files for basic state management (last processed commit) and storing suggestions.

Features

  • Git Monitoring: Periodically checks a specified branch of a local Git repository for new commits.
  • Code Analysis: Parses changed Python files to identify added, deleted, or modified functions and classes.
  • Docstring Suggestion: Generates suggestions for new or updated docstrings for added/modified Python functions and classes using an LLM.
  • README Analysis: Analyzes code changes and the current README file to suggest relevant updates or additions using an LLM.
  • State Management: Keeps track of the last processed commit to avoid redundant analysis.
  • Suggestion Storage: Saves generated suggestions in a JSON file for review.

Setup

  1. Clone/Download: Obtain the project code.
  2. Repository: Ensure you have a local clone of the Git repository you want to monitor. Update the REPO_PATH variable in main_agent_loop.py to point to this local repository.
  3. Dependencies: Install the required Python packages. It's recommended to use a virtual environment:
    python3.11 -m venv venv
    source venv/bin/activate # On Windows use `venv\Scripts\activate`
    python3.11 -m pip install -r requirements.txt
  4. API Key: Configure your LLM API key.
    • Copy the .env.example file (if provided) or create a new file named .env in the project's root directory (/home/ubuntu/doc_assistant/).
    • Add your API key(s) to the .env file. For example:
      OPENAI_API_KEY="your_openai_api_key_here"
      # ANTHROPIC_API_KEY="your_anthropic_api_key_here"
    • Specify the LLM model you want to use (ensure it's supported by LiteLLM and your key):
      LLM_MODEL="gpt-4o" # Or "claude-3-opus-20240229", etc.

Configuration

Key parameters can be adjusted directly in the main_agent_loop.py file:

  • REPO_PATH: Absolute path to the local Git repository to monitor.
  • BRANCH_NAME: The specific branch within the repository to monitor (defaults to the currently active branch if possible, otherwise 'master').
  • POLL_INTERVAL: How often (in seconds) the agent checks for new commits.
  • STATE_FILE: Path to the file storing the last processed commit hash.
  • SUGGESTIONS_FILE: Path to the file where documentation suggestions will be saved.
  • README_FILENAME: The name of the README file to analyze (default: "README.md").
  • PYTHON_EXTENSIONS: Tuple of file extensions to consider as Python code (default: (".py",)).

Running the Agent

Once set up and configured, run the main loop from the project's root directory (/home/ubuntu/doc_assistant/):

python3.11 main_agent_loop.py

The agent will start monitoring the repository. It will print logs to the console indicating its status, detected commits, analysis steps, and any generated suggestions or errors.

Press Ctrl+C to stop the agent.

Output: Suggestions

Generated suggestions are stored in the JSON file specified by SUGGESTIONS_FILE (default: state/suggestions.json). Each suggestion is a dictionary with the following structure:

{
  "file_path": "/path/to/your/repo/src/module.py", // File the suggestion applies to
  "element_name": "my_function", // Function/class name, or README filename
  "suggestion_type": "docstring", // or "readme"
  "original_content": "...", // Existing docstring or README content (for context)
  "suggested_content": "...", // The LLM-generated docstring or README update suggestion
  "confidence_score": null, // Placeholder for future enhancement
  "status": "pending", // Status (e.g., pending, accepted, rejected) - currently always pending
  "commit_hash": "abcdef123..." // Commit hash when the suggestion was generated
}

You can review this file to see the documentation updates proposed by the agent.

Code Explanation

This section provides a high-level explanation of the main Python modules in the src/ directory.

  • config.py: Loads configuration from the .env file (API keys, LLM model) and defines constants like system prompts and timeouts. Uses python-dotenv.

  • git_monitor.py: Handles all interactions with the Git repository using the GitPython library.

    • get_latest_commit_hash(repo_path, branch): Fetches the SHA hash of the most recent commit on the specified branch.
    • get_changed_files(repo_path, old_commit_hash, new_commit_hash): Determines the list of files that were added, deleted, or modified between two commits.
    • get_file_content_at_commit(repo_path, file_path, commit_hash): Retrieves the full content of a specific file as it existed at a particular commit hash.
  • code_parser.py: Parses Python code files using Python's built-in ast (Abstract Syntax Tree) module.

    • CodeVisitor(ast.NodeVisitor): Traverses the AST to find top-level class and function definitions.
    • parse_code(code_content, file_path): Takes Python code as a string, parses it into an AST, uses CodeVisitor to extract information (name, signature, docstring, line numbers, code block) about classes and functions, and returns a list of dictionaries representing these elements.
  • code_comparator.py: Compares the parsed structures of two versions of a Python file.

    • compare_code_versions(old_analysis, new_analysis): Takes the list outputs from code_parser.py for an old and new version of a file. It identifies which top-level functions/classes were added, deleted, or potentially modified by comparing names, signatures, and docstrings. Returns a dictionary categorizing these changes.
  • doc_analyzer.py: Contains logic to find relevant sections within documentation files (currently focused on README.md).

    • find_relevant_doc_sections_heuristic(...): A simple heuristic approach that splits the README into sections based on Markdown headers and searches for keywords (names of changed functions/classes) within each section.
    • find_relevant_doc_sections(...): The main entry point, currently defaults to using the heuristic method. (LLM-based analysis is planned but not implemented).
  • llm_interactor_doc.py: Manages interactions with the Large Language Model (LLM) via the litellm library.

    • _call_llm(...): A private helper function to send requests to the configured LLM, handling potential errors.
    • construct_docstring_prompt(...): Creates the specific prompt asking the LLM to generate or update a docstring based on code information and change context.
    • construct_readme_prompt(...): Creates the prompt asking the LLM to suggest README updates based on a summary of code changes and the existing README content.
    • generate_docstring(...): Orchestrates the process of creating the prompt and calling the LLM for docstring generation.
    • suggest_readme_updates(...): Orchestrates the process for generating README suggestions.
  • suggestion_generator.py: Cleans and formats the raw text output from the LLM into a structured JSON suggestion.

    • _clean_llm_docstring_output(...): Attempts to extract only the valid docstring (including triple quotes) from the LLM's potentially verbose response.
    • format_suggestion(llm_output, context, suggestion_type): Takes the raw LLM output and contextual information (file path, element name, etc.) and creates the standard suggestion dictionary used in suggestions.json.
  • main_agent_loop.py: The main executable script that orchestrates the agent's workflow.

    • Configuration & State: Defines paths, polling interval, and loads/saves the last processed commit hash (STATE_FILE) and suggestions (SUGGESTIONS_FILE).
    • process_changes(...): This core function is triggered when a new commit is detected. It iterates through changed files, calls git_monitor to get content, code_parser to analyze, code_comparator to find differences, and then uses llm_interactor_doc and suggestion_generator to create docstring suggestions for changed Python code. It also aggregates code changes to potentially trigger README analysis via llm_interactor_doc.
    • main_loop(): The main loop that periodically checks for new commits using git_monitor.get_latest_commit_hash. If a new commit is found, it calls process_changes and updates the state. Includes basic error handling and waits for the specified POLL_INTERVAL.

Limitations & Future Work

  • Basic Code Comparison: The current comparison logic is simple and might miss complex refactorings or only detect changes based on signature/docstring differences.
  • Heuristic README Analysis: README analysis relies on basic keyword matching; an LLM-based approach could be more robust.
  • Suggestion Application: The agent only generates suggestions; applying them to the codebase requires manual intervention or further tooling.
  • Error Handling: Error handling is basic; more specific error catching and retry logic could improve robustness.
  • Configuration: Configuration is mostly hardcoded in the main script; using command-line arguments or a dedicated config file would be better.
  • Testing: Limited automated tests are included within some modules; comprehensive unit and integration tests are needed.

Licensing & Commercial Use

This project is licensed under a custom license prohibiting commercial use without permission.

For commercial inquiries or licensing requests, please contact me at [[email protected]] or [https://github.com/hans992].

Python 3.8+

About

AI-powered documentation assistant that auto-generates and updates project docs. Integrates with OpenAI to keep documentation in sync with code changes.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages