This AI agent proactively monitors a specified Git repository for new commits. When changes are detected, it analyzes the modified Python code and existing documentation (specifically the README file) to suggest updates for docstrings and the README itself. The goal is to help keep documentation consistent and up-to-date with code evolution.
The agent uses:
GitPython
to interact with the Git repository.- Python's
ast
module to parse and understand the structure of Python code. litellm
to interact with a Large Language Model (LLM) like GPT-4o or Claude for generating documentation suggestions.- JSON files for basic state management (last processed commit) and storing suggestions.
- Git Monitoring: Periodically checks a specified branch of a local Git repository for new commits.
- Code Analysis: Parses changed Python files to identify added, deleted, or modified functions and classes.
- Docstring Suggestion: Generates suggestions for new or updated docstrings for added/modified Python functions and classes using an LLM.
- README Analysis: Analyzes code changes and the current README file to suggest relevant updates or additions using an LLM.
- State Management: Keeps track of the last processed commit to avoid redundant analysis.
- Suggestion Storage: Saves generated suggestions in a JSON file for review.
- Clone/Download: Obtain the project code.
- Repository: Ensure you have a local clone of the Git repository you want to monitor. Update the
REPO_PATH
variable inmain_agent_loop.py
to point to this local repository. - Dependencies: Install the required Python packages. It's recommended to use a virtual environment:
python3.11 -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate` python3.11 -m pip install -r requirements.txt
- API Key: Configure your LLM API key.
- Copy the
.env.example
file (if provided) or create a new file named.env
in the project's root directory (/home/ubuntu/doc_assistant/
). - Add your API key(s) to the
.env
file. For example:OPENAI_API_KEY="your_openai_api_key_here" # ANTHROPIC_API_KEY="your_anthropic_api_key_here"
- Specify the LLM model you want to use (ensure it's supported by LiteLLM and your key):
LLM_MODEL="gpt-4o" # Or "claude-3-opus-20240229", etc.
- Copy the
Key parameters can be adjusted directly in the main_agent_loop.py
file:
REPO_PATH
: Absolute path to the local Git repository to monitor.BRANCH_NAME
: The specific branch within the repository to monitor (defaults to the currently active branch if possible, otherwise 'master').POLL_INTERVAL
: How often (in seconds) the agent checks for new commits.STATE_FILE
: Path to the file storing the last processed commit hash.SUGGESTIONS_FILE
: Path to the file where documentation suggestions will be saved.README_FILENAME
: The name of the README file to analyze (default: "README.md").PYTHON_EXTENSIONS
: Tuple of file extensions to consider as Python code (default: (".py",)).
Once set up and configured, run the main loop from the project's root directory (/home/ubuntu/doc_assistant/
):
python3.11 main_agent_loop.py
The agent will start monitoring the repository. It will print logs to the console indicating its status, detected commits, analysis steps, and any generated suggestions or errors.
Press Ctrl+C
to stop the agent.
Generated suggestions are stored in the JSON file specified by SUGGESTIONS_FILE
(default: state/suggestions.json
). Each suggestion is a dictionary with the following structure:
{
"file_path": "/path/to/your/repo/src/module.py", // File the suggestion applies to
"element_name": "my_function", // Function/class name, or README filename
"suggestion_type": "docstring", // or "readme"
"original_content": "...", // Existing docstring or README content (for context)
"suggested_content": "...", // The LLM-generated docstring or README update suggestion
"confidence_score": null, // Placeholder for future enhancement
"status": "pending", // Status (e.g., pending, accepted, rejected) - currently always pending
"commit_hash": "abcdef123..." // Commit hash when the suggestion was generated
}
You can review this file to see the documentation updates proposed by the agent.
This section provides a high-level explanation of the main Python modules in the src/
directory.
-
config.py
: Loads configuration from the.env
file (API keys, LLM model) and defines constants like system prompts and timeouts. Usespython-dotenv
. -
git_monitor.py
: Handles all interactions with the Git repository using theGitPython
library.get_latest_commit_hash(repo_path, branch)
: Fetches the SHA hash of the most recent commit on the specified branch.get_changed_files(repo_path, old_commit_hash, new_commit_hash)
: Determines the list of files that were added, deleted, or modified between two commits.get_file_content_at_commit(repo_path, file_path, commit_hash)
: Retrieves the full content of a specific file as it existed at a particular commit hash.
-
code_parser.py
: Parses Python code files using Python's built-inast
(Abstract Syntax Tree) module.CodeVisitor(ast.NodeVisitor)
: Traverses the AST to find top-level class and function definitions.parse_code(code_content, file_path)
: Takes Python code as a string, parses it into an AST, usesCodeVisitor
to extract information (name, signature, docstring, line numbers, code block) about classes and functions, and returns a list of dictionaries representing these elements.
-
code_comparator.py
: Compares the parsed structures of two versions of a Python file.compare_code_versions(old_analysis, new_analysis)
: Takes the list outputs fromcode_parser.py
for an old and new version of a file. It identifies which top-level functions/classes were added, deleted, or potentially modified by comparing names, signatures, and docstrings. Returns a dictionary categorizing these changes.
-
doc_analyzer.py
: Contains logic to find relevant sections within documentation files (currently focused on README.md).find_relevant_doc_sections_heuristic(...)
: A simple heuristic approach that splits the README into sections based on Markdown headers and searches for keywords (names of changed functions/classes) within each section.find_relevant_doc_sections(...)
: The main entry point, currently defaults to using the heuristic method. (LLM-based analysis is planned but not implemented).
-
llm_interactor_doc.py
: Manages interactions with the Large Language Model (LLM) via thelitellm
library._call_llm(...)
: A private helper function to send requests to the configured LLM, handling potential errors.construct_docstring_prompt(...)
: Creates the specific prompt asking the LLM to generate or update a docstring based on code information and change context.construct_readme_prompt(...)
: Creates the prompt asking the LLM to suggest README updates based on a summary of code changes and the existing README content.generate_docstring(...)
: Orchestrates the process of creating the prompt and calling the LLM for docstring generation.suggest_readme_updates(...)
: Orchestrates the process for generating README suggestions.
-
suggestion_generator.py
: Cleans and formats the raw text output from the LLM into a structured JSON suggestion._clean_llm_docstring_output(...)
: Attempts to extract only the valid docstring (including triple quotes) from the LLM's potentially verbose response.format_suggestion(llm_output, context, suggestion_type)
: Takes the raw LLM output and contextual information (file path, element name, etc.) and creates the standard suggestion dictionary used insuggestions.json
.
-
main_agent_loop.py
: The main executable script that orchestrates the agent's workflow.- Configuration & State: Defines paths, polling interval, and loads/saves the last processed commit hash (
STATE_FILE
) and suggestions (SUGGESTIONS_FILE
). process_changes(...)
: This core function is triggered when a new commit is detected. It iterates through changed files, callsgit_monitor
to get content,code_parser
to analyze,code_comparator
to find differences, and then usesllm_interactor_doc
andsuggestion_generator
to create docstring suggestions for changed Python code. It also aggregates code changes to potentially trigger README analysis viallm_interactor_doc
.main_loop()
: The main loop that periodically checks for new commits usinggit_monitor.get_latest_commit_hash
. If a new commit is found, it callsprocess_changes
and updates the state. Includes basic error handling and waits for the specifiedPOLL_INTERVAL
.
- Configuration & State: Defines paths, polling interval, and loads/saves the last processed commit hash (
- Basic Code Comparison: The current comparison logic is simple and might miss complex refactorings or only detect changes based on signature/docstring differences.
- Heuristic README Analysis: README analysis relies on basic keyword matching; an LLM-based approach could be more robust.
- Suggestion Application: The agent only generates suggestions; applying them to the codebase requires manual intervention or further tooling.
- Error Handling: Error handling is basic; more specific error catching and retry logic could improve robustness.
- Configuration: Configuration is mostly hardcoded in the main script; using command-line arguments or a dedicated config file would be better.
- Testing: Limited automated tests are included within some modules; comprehensive unit and integration tests are needed.
This project is licensed under a custom license prohibiting commercial use without permission.
For commercial inquiries or licensing requests, please contact me at [[email protected]] or [https://github.com/hans992].