An intelligent code analysis system that combines semantic search with AI agentic workflows for automated Pull Request review and code analysis. Clone GitHub or Azure DevOps repositories, store them in a vector database, and perform intelligent semantic searches across your codebase with AI-powered analysis.
- Multi-Platform Support: Clone from GitHub and Azure DevOps repositories
- Semantic Search: Use natural language queries to find relevant code
- Vector Storage: Efficient storage and retrieval using Supabase vector database
- AI-Powered PR Analysis: Automated Pull Request review using agentic workflows
- Smart File Filtering: Automatically processes only relevant code files
- Interactive CLI: Easy-to-use command-line interface
- Batch Operations: Store and search multiple repositories
- Python 3.10+
- OpenAI API key (for embeddings and LLM)
- Supabase account and database
Noteπ©: I used Qwen-3-coder as the llm but you can use any llm of your choice, check the workflow\nodes.py to LLM assignment.
- Clone this repository:
git clone <your-repo-url>
cd repository-search-system- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
cp .env.example .env
# Edit .env with your API keys- Set up Supabase database:
- Run the SQL commands in
supabase_schema.sqlin your Supabase SQL editor - This creates the necessary tables and functions
- Run the SQL commands in
Store a GitHub repository:
python store_repos.pyStore a custom repository interactively:
python store_repos.py --customInteractive search:
python search_repos.py --interactiveSearch all repositories:
python search_repos.py --all "authentication logic"Run example searches:
python search_repos.pyExecute the agentic workflow for PR analysis:
python examples/pr_analysis_workflow_demo.pyTest the agentic workflow components:
python examples/test_agentic_workflow.pyCreate a .env file with the following variables:
# OpenAI API Key for embeddings
OPENAI_API_KEY=your_openai_api_key_here
# Supabase Database URL
SUPABASE_DB_URL=postgresql://user:password@host:port/database
# Optional: Azure DevOps Personal Access Token
AZURE_API_KEY=your_azure_pat_hereThe system automatically processes these file types:
- Code:
.py,.js,.ts,.java,.cpp,.cs,.php,.rb,.go,.rs,.swift - Web:
.html,.css,.scss,.jsx,.tsx - Config:
.json,.yaml,.yml,.toml,.sql - Documentation:
.md,.txt,.rst - Special files:
README,LICENSE,Dockerfile,requirements.txt, etc.
# Store GitHub repository
store_github_repo(github_url, branch=None, refresh=False)
# Store Azure DevOps repository
store_azure_repo(organization, project, repository, branch=None, pat=None, refresh=False)# Search specific repository
search_repository(repo_name, query, limit=5, show_content=False)
# Search all repositories
search_all_repos(query, limit=3)
# Interactive search interface
interactive_search()# Get repository information from GitHub
get_github_repo_info(github_url, branch=None)
# Get repository information from Azure DevOps
get_azure_repo_info(organization, project, repository, branch=None, pat=None)# Store repository in vector database
store_repo_in_own_collection(repo_info, refresh=False)
# Search within repository collection
search_repo_collection(repo_name, query, limit=5)
# List all repository collections
list_repo_collections()# GitHub repository
store_github_repo("https://github.com/user/repo", branch="main")
# Azure DevOps repository
store_azure_repo("org", "project", "repo", branch="develop")# Find authentication code
search_repository("my-repo", "user authentication and login")
# Find database queries
search_repository("my-repo", "SQL queries and database connections")
# Find API endpoints
search_repository("my-repo", "REST API endpoints and routes")# Run the complete agentic workflow for PR analysis
from workflow.graph import create_default_workflow
from workflow.state import WorkflowState
# Create workflow
workflow = create_default_workflow()
# Create initial state with PR data
state = WorkflowState(collection_name="repo-name", pr_data=json_pr_data)
# Execute workflow
result = workflow.invoke(state)βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Git Repos βββββΆβ Processing βββββΆβ Supabase β
β (GitHub/Azure) β β & Embedding β β Vector Store β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββ βββββββββββββββββββ
β File Filter β β Search API β
β & Content β β & Results β
ββββββββββββββββββββ βββββββββββββββββββ
The system implements an AI-powered agentic workflow for Pull Request (PR) analysis using LangGraph. This workflow consists of specialized agents that collaborate to analyze code changes and provide comprehensive feedback.
- Parsing Function (
parse_pr_data): Extracts repository information and collection name from PR data - Analysis Agent (
pr_analysis_agent): Reviews code changes for correctness, style, maintainability, and security risks - Tools: Vector database query tools that allow agents to explore the repository context
The PR Analysis Agent has access to the following tools:
list_directories: List all file paths in the repositoryget_metadata_by_id: Inspect metadata for a file without reading its full contentget_content_by_id: Fetch the actual code of a file for deeper analysissearch_vector_database: Search semantically for related files or concepts in the repo
The agentic workflow follows this sequence:
- Parse PR data to extract repository information
- Analyze code changes using the AI agent with access to repository tools
- Generate structured feedback including:
- Summary of the PR
- Detailed findings for each file
- Security concerns
- Final recommendation
The workflow maintains state throughout execution using the WorkflowState dataclass, which tracks:
- Collection name and PR data
- Completed workflow nodes
- Analysis results and metadata
- Execution timing information
- "No collections found": Run
store_repos.pyfirst to add repositories - Authentication errors: Check your API keys in
.envfile - Database connection: Verify your
SUPABASE_DB_URLis correct - Large repositories: The system skips files >100KB automatically
- Use specific queries for better results
- Limit search results for faster responses
- Refresh collections only when necessary
- Monitor your OpenAI API usage
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request