Name	Name	Last commit message	Last commit date
parent directory ..
.env	.env
.gitignore	.gitignore
README.md	README.md
main.py	main.py
models.py	models.py
pyproject.toml	pyproject.toml

Self-Updating Wiki for Your Codebases with LLM

Star 🌟 CocoIndex if you like it!!

This example shows how to use instructor with Gemini to analyze multiple Python codebases and generate markdown documentation using CocoIndex v1.

What It Does

Scans subdirectories of a root directory (each expected to be a separate Python project)
Per-file extraction using LLM with a unified CodebaseInfo model:
- Public classes and functions with functionality summaries
- CocoIndex app call relationship graphs (Mermaid format)
- File-level summaries
Project aggregation - combines file-level CodebaseInfo into a project-level summary
Outputs markdown documentation to output/PROJECT_NAME.md

Key Features

Instructor Integration: Uses instructor library for structured LLM outputs with Pydantic
Unified Data Model: Same CodebaseInfo type for both file-level and project-level extraction
LLM-Generated Mermaid Graphs: The LLM generates mermaid syntax directly with:
- Bold text for @coco.fn decorated functions
- Thick arrows (==>) for mount/use_mount calls
Incremental Processing: CocoIndex handles caching - only re-processes changed files
Multi-Project Support: Processes multiple codebases in parallel

Output Format

The generated markdown includes:

Overview - High-level project description
Components - Classes and functions with summaries
CocoIndex Pipeline - Mermaid diagrams (if CocoIndex is used)
File Details - Per-file summaries (for multi-file projects)

Example Mermaid Graph

graph TD
    %% App: SampleApp
    app_main[<b>app_main</b>] ==> process_file[<b>process_file</b>]
    process_file --> helper_func[helper_func]

Bold = @coco.fn, thick arrows (==>) = mount/use_mount calls

Run

1. Install dependencies

pip install -e .

2. Set up environment variables

Create a .env file in the example directory:

echo "GEMINI_API_KEY=your_api_key_here" > .env

Replace your_api_key_here with your actual Gemini API key.

Optionally, set a different LLM model:

echo "LLM_MODEL=gemini/gemini-2.5-flash" >> .env

3. Prepare your projects

Create a projects/ directory with subdirectories for each Python project:

projects/
├── my_project_1/
│   ├── main.py
│   └── utils.py
├── my_project_2/
│   └── app.py
└── ...

4. Run the application

cocoindex update main.py

This will:

Scan all subdirectories in projects/
Extract information from all .py files (excluding .venv* directories)
Generate markdown documentation in output/

5. Verify the output

ls -la output/
cat output/my_project_1.md

Customization

Change Input/Output Directories

Edit the app definition in main.py:

app = coco.App(
    app_main,
    coco.AppConfig(name="MultiCodebaseSummarization"),
    root_dir=pathlib.Path("./your_projects_dir"),
    output_dir=pathlib.Path("./your_output_dir"),
)

Use a Different LLM

Set the LLM_MODEL environment variable to any LiteLLM-supported model:

# OpenAI
export LLM_MODEL=gpt-4o

# Anthropic
export LLM_MODEL=anthropic/claude-3-5-sonnet

# Local (Ollama)
export LLM_MODEL=ollama/llama3.2

How It Works

graph TD
    %% App: MultiCodebaseSummarization
    app_main[<b>app_main</b>] ==> process_project[<b>process_project</b>]
    process_project ==> extract_file_info[<b>extract_file_info</b>]
    process_project ==> aggregate_project_info[<b>aggregate_project_info</b>]
    process_project --> generate_markdown[generate_markdown]

app_main: Lists subdirectories, sets up output target, mounts process_project for each
process_project: Extracts info from each file, aggregates, outputs markdown
extract_file_info: Uses instructor + LLM to extract CodebaseInfo from each file
aggregate_project_info: Combines file CodebaseInfo into project-level CodebaseInfo
generate_markdown: Converts CodebaseInfo to markdown and calls declare_file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Self-Updating Wiki for Your Codebases with LLM

What It Does

Key Features

Output Format

Example Mermaid Graph

Run

1. Install dependencies

2. Set up environment variables

3. Prepare your projects

4. Run the application

5. Verify the output

Customization

Change Input/Output Directories

Use a Different LLM

How It Works

FilesExpand file tree

multi_codebase_summarization

Directory actions

More options

Directory actions

More options

Latest commit

History

multi_codebase_summarization

Folders and files

parent directory

README.md

Self-Updating Wiki for Your Codebases with LLM

What It Does

Key Features

Output Format

Example Mermaid Graph

Run

1. Install dependencies

2. Set up environment variables

3. Prepare your projects

4. Run the application

5. Verify the output

Customization

Change Input/Output Directories

Use a Different LLM

How It Works