GitHub Repository Summarizer API

This is a Flask-based REST API application that leverages Large Language Models (LLMs) via LangChain to automatically analyze and summarize any public GitHub repository.

The application scans the project structure, reads the README.md, uses AI to select the most important source code files, and generates a structured JSON response describing the project, its technology stack, and its architecture based on those files.

Features

Smart file selection: The LLM analyzes the file tree and selects up to 6 key files to understand the project's logic, ignoring binaries and clutter.
Deep analysis: Reads the contents of the selected files (with a size limit to avoid exceeding the context window) and forms an accurate description.
Structured output: Uses LangChain's JsonOutputParser and Pydantic to guarantee a valid JSON response.
Logging: Detailed execution logs (including LLM prompts and responses) are saved to app.log.

Requirements

Python 3.9+
GitHub Personal Access Token (to bypass API rate limits)
LLM Provider API Key (configured for Nebius API with Llama 3.1 / Qwen models in this setup)

Installation and Setup

Clone the repository (or create a project folder):

   git clone 
   cd

Create and activate a virtual environment:

   python -m venv venv
   source venv/bin/activate  # On Windows use: venv\Scripts\activate

Install dependencies:

   pip install flask requests python-dotenv langchain-openai pydantic langchain-core gunicorn

Configure environment variables: Create a .env file in the root of the project and add your keys:

   NEBIUS_API_KEY=your_nebius_api_key
   GITHUB_TOKEN=your_github_token

Start the server with Gunicorn:

   gunicorn app:app -w 4 -b 0.0.0.0:8000

The server will start at http://0.0.0.0:8000.

Gunicorn parameters:

-w 4 — number of worker processes (adjust based on CPU cores)
-b 0.0.0.0:8000 — bind to all interfaces on port 8000
app:app — module name and Flask app object

API Documentation

1. Get Repository Summary

Endpoint: POST /summarize

Accepts a GitHub repository URL, fetches the repository contents, and returns a summary generated by an LLM.

Request body:

{
  "github_url": "https://github.com/psf/requests"
}

Field	Type	Required	Description
`github_url`	`string`	Yes	URL of a public GitHub repository

Response (200 OK):

{
  "summary": "Requests is a popular Python library for making HTTP requests...",
  "technologies": ["Python", "urllib3", "certifi"],
  "structure": "The project follows a standard Python package layout with the main source code in src/requests/, tests in tests/, and documentation in docs/."
}

Field	Type	Description
`summary`	`string`	A human-readable description of what the project does
`technologies`	`string[]`	List of main technologies, languages, and frameworks used
`structure`	`string`	Brief description of the project structure and organization

Error Response:

{
  "status": "error",
  "message": "Description of what went wrong"
}

Possible Errors:

400 Bad Request: Missing URL or invalid URL format.
502 Bad Gateway: Error communicating with the GitHub API (e.g., repository not found).
500 Internal Server Error: Error generating or parsing the LLM response.

2. Server Health Check

Endpoint: GET /health

Returns the server status. Used for monitoring.

Response:

{
  "status": "ok"
}

How it works under the hood (Architecture)

analyze_repo_structure: Fetches the repository file tree (up to depth 2) and the README.md. Sends them to the LLM with a prompt to return a JSON array of up to 6 most important files (configs, entry points, core logic).
generate_repo_summary: Downloads the source code of those 6 files via the GitHub API. Then, all the context (Tree + README + Code) is sent in a second LLM call with a strict Pydantic response schema (RepoSummarySchema) to form the final result.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Repository Summarizer API

Features

Requirements

Installation and Setup

API Documentation

1. Get Repository Summary

2. Server Health Check

How it works under the hood (Architecture)

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

GitHub Repository Summarizer API

Features

Requirements

Installation and Setup

API Documentation

1. Get Repository Summary

2. Server Health Check

How it works under the hood (Architecture)