PPT-RAG

PPT-Centric Multimodal RAG for Study Preview & Exam Review

Overview

PPT-RAG is an end-to-end intelligent assistant designed specifically for understanding presentation files (PPT/PPTX) as multimodal learning material.

Unlike conventional plain-text document QA systems, this project treats each slide as a structured multimodal unit that may contain concise bullet text, dense visual cues, tables, mathematical equations, and layout-level semantics. The system augments multimodal RAG with PPT-specific preprocessing to recover hidden information that presenters often omit from slides but imply through short phrases, visual context, and page organization.

The project targets two primary student-centered scenarios:

Before-class preview: Quickly build a conceptual overview of what a lecture deck covers and establish connections between key topics.
Before-exam review: Retrieve key points, relationships, and visual evidence across long slide decks with semantic understanding.

At runtime, users can interact through a web UI, QQ, or WeChat. Under the hood, an intelligent agent orchestrates multimodal retrieval and targeted image understanding to produce grounded, evidence-backed answers.

Key Features

PPT-first multimodal RAG pipeline: Comprehensive grounded QA across text, images, tables, and mathematical content with entity-relationship awareness.
Hidden-information expansion for concise slides: Intelligently detects high-compression pages and expands implicit content into grounded, explanatory text based on slide context.
Page-topic extraction and structural linking: Extracts per-page topics and semantically links related slides to model section-level continuity and hierarchical structure in long decks.
Multi-channel accessibility: One unified backend supports Web, QQ, and WeChat, making the assistant seamlessly accessible within familiar student study workflows.

Architecture

The architecture follows a top-down, retrieval-augmented agent design:

Figure: PPT-RAG index-building workflow (RAG Index pipeline).

Figure: PPT-RAG agent runtime workflow (Agent Loop pipeline).

1. Parsing & Normalization Layer

Ingests PPT/PPTX documents and converts them into typed, structured content items (text, images, tables, mathematical equations).
Preserves page indices and comprehensive structural metadata for downstream slide-level semantic reasoning.

2. PPT-Oriented Preprocessing Layer

Applies project-specific semantic enhancement after initial parsing.
Performs hidden-information candidate detection and context-grounded expansion to recover presenter intent.
Extracts page-level topics and computes topic similarity to identify and connect related slide groups across sections.

3. Multimodal Knowledge Layer

Inserts enriched multimodal content into a knowledge graph-backed storage system.
Organizes entities, relationships, semantic chunks, and document references for hybrid retrieval with both semantic and lexical matching.

4. Agent Runtime Layer

Employs an efficient tool-calling agent loop to determine when and how to apply retrieval, reasoning, and image understanding.
Combines retrieval evidence with optional iterative image analysis and visual reasoning.
Generates final answers with document-grounded evidence chains rather than free-form generation.

5. Interaction Layer

Exposes the unified reasoning backend through multiple user interfaces:
- Streamlit-based web application with interactive visualization.
- QQ bot runtime pipeline for instant messaging integration.
- WeChat bot runtime pipeline for social platform accessibility.

Name		Name	Last commit message	Last commit date
Latest commit History 260 Commits
.github		.github
assets		assets
docs		docs
examples		examples
project_zjh		project_zjh
raganything		raganything
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README_zh.md		README_zh.md
env.example		env.example
myexample.py		myexample.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
todolist.md		todolist.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPT-RAG

Overview

Key Features

Architecture

1. Parsing & Normalization Layer

2. PPT-Oriented Preprocessing Layer

3. Multimodal Knowledge Layer

4. Agent Runtime Layer

5. Interaction Layer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PPT-RAG

Overview

Key Features

Architecture

1. Parsing & Normalization Layer

2. PPT-Oriented Preprocessing Layer

3. Multimodal Knowledge Layer

4. Agent Runtime Layer

5. Interaction Layer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages