This is a Flask-based REST API application that leverages Large Language Models (LLMs) via LangChain to automatically analyze and summarize any public GitHub repository.
The application scans the project structure, reads the README.md, uses AI to select the most important source code files, and generates a structured JSON response describing the project, its technology stack, and its architecture based on those files.
- Smart file selection: The LLM analyzes the file tree and selects up to 6 key files to understand the project's logic, ignoring binaries and clutter.
- Deep analysis: Reads the contents of the selected files (with a size limit to avoid exceeding the context window) and forms an accurate description.
- Structured output: Uses LangChain's
JsonOutputParserand Pydantic to guarantee a valid JSON response. - Logging: Detailed execution logs (including LLM prompts and responses) are saved to
app.log.
- Python 3.9+
- GitHub Personal Access Token (to bypass API rate limits)
- LLM Provider API Key (configured for Nebius API with Llama 3.1 / Qwen models in this setup)
- Clone the repository (or create a project folder):
git clone
cd - Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate- Install dependencies:
pip install flask requests python-dotenv langchain-openai pydantic langchain-core gunicorn- Configure environment variables:
Create a
.envfile in the root of the project and add your keys:
NEBIUS_API_KEY=your_nebius_api_key
GITHUB_TOKEN=your_github_token- Start the server with Gunicorn:
gunicorn app:app -w 4 -b 0.0.0.0:8000The server will start at http://0.0.0.0:8000.
Gunicorn parameters:
-w 4— number of worker processes (adjust based on CPU cores)-b 0.0.0.0:8000— bind to all interfaces on port 8000app:app— module name and Flask app object
Endpoint: POST /summarize
Accepts a GitHub repository URL, fetches the repository contents, and returns a summary generated by an LLM.
Request body:
{
"github_url": "https://github.com/psf/requests"
}| Field | Type | Required | Description |
|---|---|---|---|
github_url |
string |
Yes | URL of a public GitHub repository |
Response (200 OK):
{
"summary": "Requests is a popular Python library for making HTTP requests...",
"technologies": ["Python", "urllib3", "certifi"],
"structure": "The project follows a standard Python package layout with the main source code in src/requests/, tests in tests/, and documentation in docs/."
}| Field | Type | Description |
|---|---|---|
summary |
string |
A human-readable description of what the project does |
technologies |
string[] |
List of main technologies, languages, and frameworks used |
structure |
string |
Brief description of the project structure and organization |
Error Response:
{
"status": "error",
"message": "Description of what went wrong"
}Possible Errors:
400 Bad Request: Missing URL or invalid URL format.502 Bad Gateway: Error communicating with the GitHub API (e.g., repository not found).500 Internal Server Error: Error generating or parsing the LLM response.
Endpoint: GET /health
Returns the server status. Used for monitoring.
Response:
{
"status": "ok"
}analyze_repo_structure: Fetches the repository file tree (up to depth 2) and theREADME.md. Sends them to the LLM with a prompt to return a JSON array of up to 6 most important files (configs, entry points, core logic).generate_repo_summary: Downloads the source code of those 6 files via the GitHub API. Then, all the context (Tree + README + Code) is sent in a second LLM call with a strict Pydantic response schema (RepoSummarySchema) to form the final result.