Skip to content

Latest commit

 

History

History
732 lines (523 loc) · 32.3 KB

File metadata and controls

732 lines (523 loc) · 32.3 KB
banana-slides Anionex%2Fbanana-slides | Trendshift

A native AI PPT generation application based on nano banana pro🍌
Go from idea to presentation in minutes—no tedious formatting, verbalized edits, moving towards a true "Vibe PPT"

🚀 Online Demo  •  📚 Documentation  •  English

GitHub Stars GitHub Forks GitHub Watchers

Version Docker License

If this project is helpful to you, please consider giving it a Star 🌟 & Fork 🍴

✨ Project Origin

Have you ever found yourself in this predicament: the presentation is due tomorrow, but your PPT is still a blank slate; you have countless brilliant ideas in your head, but all your enthusiasm is drained by tedious layout and design?

We long to quickly create presentations that are both professional and well-designed. While traditional AI PPT generation apps generally meet the "speed" requirement, they still suffer from the following issues:

  • 1️⃣ Limited to preset templates, with no flexibility to adjust styles
  • 2️⃣ Low degree of freedom, making it difficult to perform multiple rounds of revisions
  • 3️⃣ Similar visual results with severe homogenization
  • 4️⃣ Lower quality of assets and a lack of specific relevance
  • 5️⃣ Disjointed text-image layouts and poor design sense

These deficiencies make it difficult for traditional AI PPT generators to simultaneously satisfy our two major needs: "speed" and "beauty." Even those claiming to be "Vibe PPT" are, in my eyes, still far from being truly "Vibe."

However, the emergence of the nano banana🍌 model has changed everything. I tried using 🍌pro to generate PPT pages and found that the results were excellent in terms of quality, aesthetics, and consistency. It can accurately render almost all the text requested in the prompt while following the style of reference images. So, why not build a native "Vibe PPT" application based on 🍌pro?

👨‍💻 Usage Scenarios

  1. Beginners: Quickly generate beautiful PPTs with zero barrier to entry and no design experience required, reducing the hassle of choosing templates.
  2. PPT Professionals: Reference AI-generated layouts and combinations of graphics and text to quickly gain design inspiration.
  3. Educators: Quickly convert teaching content into illustrated lesson plan PPTs to enhance classroom effectiveness.
  4. Students: Quickly complete class presentations and focus energy on content rather than layout and beautification.
  5. Business Professionals: Quickly visualize business proposals and product introductions with fast adaptation to multiple scenarios.

🎯 Goal: Lower the barrier to entry for PPT creation, empowering everyone to quickly create beautiful and professional presentations.

🎨 Result Examples

Case 3 Case 2
Software Development Best Practices DeepSeek-V3.2 Technology Showcase
Case 4 Case 1
R&D and Industrialization of Intelligent Production Line Equipment for Prepared Dishes The Evolution of Money: A Journey from Shells to Banknotes

See more Use Cases

🎯 Features

1. Flexible and Diverse Creative Paths

Supports three starting modes: Idea, Outline, and Page Description, catering to different creative habits.

  • One-sentence Generation: Enter a topic, and AI automatically generates a well-structured outline and page-by-page content descriptions.
  • Natural Language Editing: Supports modifying the outline or description through conversational "Vibes" (e.g., "change the third page to a case study"), with AI responding and adjusting in real-time.
  • Outline/Description Mode: Supports both one-click batch generation and manual adjustment of details.
image

2. Powerful Asset Parsing Capabilities

  • Multi-format Support: Upload PDF, Docx, MD, Txt, and other files for automatic background content parsing.
  • Intelligent Extraction: Automatically identify key points, image links, and chart information in the text to provide rich source material for generation.
  • Style Reference: Support uploading reference images or templates to customize PPT styles.
File Parsing and Material Processing

3. "Vibe"-style Natural Language Modification

No longer limited by complex menu buttons, issue modification commands directly through natural language.

  • In-painting: Perform conversational modifications on specific areas (e.g., "Change this chart to a pie chart").
  • Full-page Optimization: Generate high-definition, stylistically consistent pages based on nano banana pro🍌.
image

4. Out-of-the-box Format Export

  • Multi-format Support: One-click export to standard PPTX or PDF files.
  • Perfect Fit: Default 16:9 aspect ratio, no manual layout adjustments needed, ready for presentation.
image PPT and PDF Export

5. Editable PPTX Export (Beta in Progress)

  • Export images as high-fidelity, clean-background PPT pages with freely editable images and text
  • See Anionex#121 for related updates
image

🌟 Comparison with notebooklm slide deck features

Feature notebooklm This Project
Page Limit 15 pages Unlimited
Secondary Editing Prompt-based modification Selection-based editing + Verbal editing
Asset Addition Cannot add after generation Add freely after generation
Export Formats Supports PDF, (non-editable image) pptx Export as PDF, (image or editable) pptx
Watermark Watermarked in free version No watermark, freely add/remove elements

Note: This comparison may become outdated as new features are added

🔥 Recent Updates

  • 【2-9】:

    • New Features
      • Support for pasting images on the home page, outline, and description cards for immediate recognition, providing a better interactive experience.
      • Manual Outline Editing: Supports manually adjusting the chapter (part) a page belongs to.
      • Docker Multi-architecture: Images now support amd64 / arm64 builds.
      • Internationalization + Dark Mode: Added Chinese-English switching; supports light/dark/follow-system themes; all components adapted for dark mode.
    • Fixes and Experience Optimizations
      • Fixed export-related 500 errors, reference file association timing, outline/page data misalignment, task polling errors, infinite polling in description generation, image preview memory leaks, and partial failure handling in batch deletion.
      • Optimized format example tips, HTTP error message copy, Modal closing experience, cleaned up old project localStorage, and removed redundant prompts during initial project creation.
      • Various other optimizations and fixes.
  • 【1-4】 : v0.4.0 Release: Major upgrade for editable pptx export:

    • Supports maximum restoration of text font size, color, bolding, and other styles from images;
    • Added support for recognizing text content within tables;
    • More precise logic for text size and position restoration;
    • Optimized export workflow, significantly reducing residual text on background images after export;
    • Supports page multi-select logic, allowing flexible selection of specific pages to generate and export.
    • Detailed effects and usage can be found at Anionex#121
  • 【12-27】: Added support for a template-free mode and high-quality text presets; you can now control PPT page styles through pure text descriptions.

🗺️ Roadmap

Status Milestone
✅ Completed Create PPT via three paths: idea, outline, and page description
✅ Completed Parse Markdown-formatted images in text
✅ Completed Add more assets to single PPT slides
✅ Completed Area selection and Vibe-style voice editing for single slides
✅ Completed Asset module: asset generation, upload, etc.
✅ Completed Support for multi-file upload and parsing
✅ Completed Support Vibe-style voice adjustments for outlines and descriptions
✅ Completed Preliminary support for exporting editable .pptx files
🔄 In Progress Support multi-layered, precise image cutout in editable .pptx exports
🔄 In Progress Web search
🔄 In Progress Agent mode
🚍 Partial Optimize frontend loading speed
🧭 Planned Online presentation/playback feature
🧭 Planned Simple animations and slide transitions
🚍 Partial Multi-language support
🏢 Commercial Feature User system

📦 Usage

(New) One-click deployment using application templates

This is the simplest method, requiring no Docker installation or project downloading. You can access the application directly after creation.

  1. Deploy and start this application with one click via Rainyun (High bandwidth, suitable for HD image generation and downloading. New users enjoy a 15-day free trial).

Deploy on Rainyun

  1. Coming soon

Using Docker Compose🐳

Quickly start frontend and backend services using Docker Compose.

📒 Instructions for Windows/Mac Users

If you are using Windows or macOS, please install Docker Desktop first and ensure that Docker is running (check the system tray icon on Windows or the menu bar icon on macOS), then follow the same steps as in the documentation.

Tip: If you encounter issues, Windows users should enable the WSL 2 backend in Docker Desktop settings (recommended); also ensure that ports 3000 and 5000 are not occupied.

  1. Clone the Repository
git clone https://github.com/Anionex/banana-slides
cd banana-slides
  1. Configure Environment Variables

Create the .env file (refer to .env.example):

cp .env.example .env

Edit the .env file and configure the necessary environment variables:

The LLM API in this project follows the AIHubMix platform format. It is recommended to use AIHubMix (Click here to visit) to obtain an API key to reduce migration costs.
Friendly Reminder: The Google Nano Banana Pro model API costs are relatively high, please be mindful of usage costs.

# AI Provider Configuration Format (gemini / openai / vertex)

AI_PROVIDER_FORMAT=gemini

# Gemini Format Configuration (Used when AI_PROVIDER_FORMAT=gemini)

GOOGLE_API_KEY=your-api-key-here
GOOGLE_API_BASE=https://generativelanguage.googleapis.com

# Proxy Example: https://aihubmix.com/gemini

# OpenAI Format Configuration (Used when AI_PROVIDER_FORMAT=openai)

OPENAI_API_KEY=your-api-key-here
OPENAI_API_BASE=https://api.openai.com/v1

# Proxy Example: https://aihubmix.com/v1

# Vertex AI Configuration (AI_PROVIDER_FORMAT=vertex)

# GCP Project and Service Account Key Required

# VERTEX_PROJECT_ID=your-gcp-project-id

# VERTEX_LOCATION=global

# GOOGLE_APPLICATION_CREDENTIALS=./gcp-service-account.json

# Lazyllm Format Configuration (Used when AI_PROVIDER_FORMAT=lazyllm)

# Select vendors for text generation and image generation

TEXT_MODEL_SOURCE=deepseek        # Text generation model provider
IMAGE_MODEL_SOURCE=doubao         # Image editing model provider
IMAGE_CAPTION_MODEL_SOURCE=qwen   # Image captioning model provider

# API Keys for Various Providers (Only configure the ones you want to use)

```env
DOUBAO_API_KEY=your-doubao-api-key            # Volcengine/Doubao
DEEPSEEK_API_KEY=your-deepseek-api-key        # DeepSeek
QWEN_API_KEY=your-qwen-api-key                # Alibaba Cloud/Qwen
GLM_API_KEY=your-glm-api-key                  # Zhipu GLM
SILICONFLOW_API_KEY=your-siliconflow-api-key  # SiliconFlow
SENSENOVA_API_KEY=your-sensenova-api-key      # SenseTime SenseNova
MINIMAX_API_KEY=your-minimax-api-key          # MiniMax
...

Use the new version of the editable export configuration method to achieve better editable export results: You need to obtain an API KEY from the Baidu AI Cloud Platform (click here to enter) and fill it in the BAIDU_API_KEY field in the .env file (there is a sufficient free usage quota). For details, see the instructions in Anionex#121.

📒 Vertex AI Configuration Guide (for GCP users)

Google Cloud Vertex AI allows calling Gemini models through GCP service accounts; new users can use promotional credits. Configuration steps:

  1. Go to the GCP Console, create a service account, and download the JSON format key file.
  2. Save the key file as gcp-service-account.json in the project root directory.
  3. Set in .env:
    AI_PROVIDER_FORMAT=vertex
    VERTEX_PROJECT_ID=your-gcp-project-id
    VERTEX_LOCATION=global
  4. If deploying with Docker, you also need to uncomment relevant sections in docker-compose.yml to mount the key file into the container and set the GOOGLE_APPLICATION_CREDENTIALS environment variable.

The gemini-3-* series models require VERTEX_LOCATION=global

  1. Start Service

⚡ Use Pre-built Images (Recommended)

The project provides pre-built frontend and backend images on Docker Hub (synced with the latest version of the main branch), allowing you to skip local build steps for rapid deployment:

# Launching with Pre-built Images (No Need to Build from Scratch)

```bash
docker compose -f docker-compose.prod.yml up -d

Image names:

  • anoinex/banana-slides-frontend:latest
  • anoinex/banana-slides-backend:latest

Build images from scratch

docker compose up -d

Tip

If you encounter network issues, you can uncomment the mirror source configuration in the .env file and then rerun the startup command:

# Uncomment the following in the .env file to use domestic mirror sources
DOCKER_REGISTRY=docker.1ms.run/
GHCR_REGISTRY=ghcr.nju.edu.cn/
APT_MIRROR=mirrors.aliyun.com
PYPI_INDEX_URL=https://mirrors.cloud.tencent.com/pypi/simple
NPM_REGISTRY=https://registry.npmmirror.com/
  1. Access the Application
  1. View Logs
# View Backend Logs (Last 200 Lines)

docker logs --tail 200 banana-slides-backend

# Real-time View of Backend Logs (Last 100 Lines)

docker logs -f --tail 100 banana-slides-backend

# View Frontend Logs (Last 100 Lines)

docker logs --tail 100 banana-slides-frontend
  1. Stop Services
docker compose down
  1. Update Project

Using Pre-built Images (docker-compose.prod.yml)

docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d

Using Local Build (docker-compose.yml)

Note: If the code has been manually modified, this method is not applicable. You need to revert the code to the version when it was pulled first.

git pull 
docker compose down
docker compose build --no-cache
docker compose up -d

Note: Thanks to the excellent developer friend @ShellMonster for providing a Newbie Deployment Tutorial. It is specifically designed for beginners without any server deployment experience. You can click the link to view it.

Deploy from source

Environment Requirements

  • Python 3.10 or higher
  • uv - Python package manager
  • Node.js 16+ and npm
  • A valid Google Gemini API key
  • (Optional) LibreOffice - Required when uploading PPTX files using the "PPT Refurbishment" feature, used for converting PPTX to PDF. It is recommended to convert PPTX to PDF locally before uploading. Reason: Server-side rendering by LibreOffice may cause layout misalignment due to missing fonts (e.g., Microsoft YaHei, Calibri, etc.) and cannot fully restore some special effects. LibreOffice is not required if you upload PDF files. For Docker users who still need to support PPTX uploads within the container, run:
    docker exec -it banana-slides-backend bash -c "apt-get update && apt-get install -y libreoffice-impress && rm -rf /var/lib/apt/lists/*"

    Note: LibreOffice installed this way will be lost when the container is rebuilt and must be reinstalled.

Backend Installation

  1. Clone the repository
git clone https://github.com/Anionex/banana-slides
cd banana-slides
  1. Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
  1. Install dependencies

Run the following command in the project root directory:

uv sync

This will automatically install all dependencies based on pyproject.toml.

  1. Configure environment variables

Copy the environment variable template:

cp .env.example .env

Edit the .env file and configure your API keys:

The LLM interfaces in this project follow the AIHubMix platform standard. It is recommended to use AIHubMix to obtain API keys to minimize migration costs.

# AI Provider Format Configuration (gemini / openai / vertex)

AI_PROVIDER_FORMAT=gemini

# Gemini Format Configuration (Used when AI_PROVIDER_FORMAT=gemini)

GOOGLE_API_KEY=your-api-key-here
GOOGLE_API_BASE=https://generativelanguage.googleapis.com

# Proxy Example: https://aihubmix.com/gemini

# OpenAI Format Configuration (Used when AI_PROVIDER_FORMAT=openai)

OPENAI_API_KEY=your-api-key-here
OPENAI_API_BASE=https://api.openai.com/v1

# Proxy Example: https://aihubmix.com/v1

# Vertex AI Configuration (AI_PROVIDER_FORMAT=vertex)

# Requires GCP Project and Service Account Key

# VERTEX_PROJECT_ID=your-gcp-project-id

# VERTEX_LOCATION=global

# GOOGLE_APPLICATION_CREDENTIALS=./gcp-service-account.json

# Modify this variable to control the backend service port

# Deployment Guide

## Environment Variable Configuration

Create a `.env` file in the root directory of the project and configure the following variables:

```bash
BACKEND_PORT=5000
...

Frontend Installation

  1. Navigate to the frontend directory
cd frontend
  1. Install dependencies
npm install
  1. Configure API address

The frontend will automatically connect to the backend service at http://localhost:5000. If you need to modify this, please edit src/api/client.ts.

Start Backend Service

(Optional) If you have important local data, it is recommended to back up the database before upgrading:
cp backend/instance/database.db backend/instance/database.db.bak

cd backend
uv run alembic upgrade head && uv run python app.py

The backend service will start at http://localhost:5000.

Visit http://localhost:5000/health to verify if the service is running correctly.

Start Front-end Development Server

cd frontend
npm run dev

The frontend development server will start at http://localhost:3000.

Open your browser to access and use the application.

🛠️ Technical Architecture

Frontend Tech Stack

  • Framework: React 18 + TypeScript
  • Build Tool: Vite 5
  • State Management: Zustand
  • Routing: React Router v6
  • UI Components: Tailwind CSS
  • Drag and Drop: @dnd-kit
  • Icons: Lucide React
  • HTTP Client: Axios

Backend Tech Stack

  • Language: Python 3.10+
  • Framework: Flask 3.0
  • Package Management: uv
  • Database: SQLite + Flask-SQLAlchemy
  • AI Capabilities: Google Gemini API
  • PPT Processing: python-pptx
  • Image Processing: Pillow
  • Concurrency Handling: ThreadPoolExecutor
  • CORS Support: Flask-CORS

📁 Project Structure

banana-slides/
├── frontend/                    # React frontend application
│   ├── src/
│   │   ├── pages/              # Page components
│   │   │   ├── Home.tsx        # Home (Create Project)
│   │   │   ├── OutlineEditor.tsx    # Outline editing page
│   │   │   ├── DetailEditor.tsx     # Detailed description editing page
│   │   │   ├── SlidePreview.tsx     # Slide preview page
│   │   │   └── History.tsx          # History version management page
│   │   ├── components/         # UI components
│   │   │   ├── outline/        # Outline-related components
│   │   │   │   └── OutlineCard.tsx
│   │   │   ├── preview/        # Preview-related components
│   │   │   │   ├── SlideCard.tsx
│   │   │   │   └── DescriptionCard.tsx
│   │   │   ├── shared/         # Shared components
│   │   │   │   ├── Button.tsx
│   │   │   │   ├── Card.tsx
│   │   │   │   ├── Input.tsx
│   │   │   │   ├── Textarea.tsx
│   │   │   │   ├── Modal.tsx
│   │   │   │   ├── Loading.tsx
│   │   │   │   ├── Toast.tsx
│   │   │   │   ├── Markdown.tsx
│   │   │   │   ├── MaterialSelector.tsx
│   │   │   │   ├── MaterialGeneratorModal.tsx
│   │   │   │   ├── TemplateSelector.tsx
│   │   │   │   ├── ReferenceFileSelector.tsx
│   │   │   │   └── ...
│   │   │   ├── layout/         # Layout components
│   │   │   └── history/        # History version components
│   │   ├── store/              # Zustand state management
│   │   │   └── useProjectStore.ts
│   │   ├── api/                # API interfaces
│   │   │   ├── client.ts       # Axios client configuration
│   │   │   └── endpoints.ts    # API endpoint definitions
│   │   ├── types/              # TypeScript type definitions
│   │   ├── utils/              # Utility functions
│   │   ├── constants/          # Constant definitions
│   │   └── styles/             # Style files
│   ├── public/                 # Static resources
│   ├── package.json
│   ├── vite.config.ts
│   ├── tailwind.config.js      # Tailwind CSS configuration
│   ├── Dockerfile
│   └── nginx.conf              # Nginx configuration
│
├── backend/                    # Flask backend application
│   ├── app.py                  # Flask application entry point
│   ├── config.py               # Configuration file
│   ├── models/                 # Database models
│   │   ├── project.py          # Project model
│   │   ├── page.py             # Page model (slide pages)
│   │   ├── task.py             # Task model (asynchronous tasks)
│   │   ├── material.py         # Material model (reference materials)
│   │   ├── user_template.py    # UserTemplate model (user templates)
│   │   ├── reference_file.py   # ReferenceFile model (reference files)
│   │   ├── page_image_version.py # PageImageVersion model (page versions)
│   ├── services/               # Service layer
│   │   ├── ai_service.py       # AI generation service (Gemini integration)
│   │   ├── file_service.py     # File management service
│   │   ├── file_parser_service.py # File parsing service
│   │   ├── export_service.py   # PPTX/PDF export service
│   │   ├── task_manager.py     # Asynchronous task management
│   │   ├── prompts.py          # AI prompt templates
│   ├── controllers/            # API controllers
│   │   ├── project_controller.py      # Project management
│   │   ├── page_controller.py         # Page management
│   │   ├── material_controller.py     # Material management
│   │   ├── template_controller.py     # Template management
│   │   ├── reference_file_controller.py # Reference file management
│   │   ├── export_controller.py       # Export functionality
│   │   └── file_controller.py         # File upload
│   ├── utils/                  # Utility functions
│   │   ├── response.py         # Unified response format
│   │   ├── validators.py       # Data validation
│   │   └── path_utils.py       # Path handling
│   ├── instance/               # SQLite database (auto-generated)
│   ├── exports/                # Export files directory
│   ├── Dockerfile
│   └── README.md
│
├── tests/                      # Test files directory
├── v0_demo/                    # Early demo version
├── output/                     # Output files directory
│
├── pyproject.toml              # Python project configuration (uv management)
├── uv.lock                     # uv dependency lock file
├── docker-compose.yml          # Docker Compose configuration
├── .env.example                 # Environment variable example
├── LICENSE                     # License
└── README.md                   # This file

Communication Group

To facilitate communication and mutual assistance, this WeChat group has been created.

Suggestions for new features or feedback are welcome. I will also answer questions in a laid-back manner.

image

🔧 FAQ

  1. Generated page text is garbled or blurry

    • You can choose a higher resolution output (the OpenAI format may not support resolution adjustments; using the Gemini format is recommended). Based on testing, increasing the resolution from 1k to 2k before generating the page significantly improves text rendering quality.
    • Please ensure that the specific text content to be rendered is included in the page description.
  2. Poor results when exporting editable PPT, such as overlapping text or missing styles

    • In 90% of cases, this is due to API configuration issues. Please refer to issue 121 for troubleshooting and solutions.
  3. Does it support the free-tier Gemini API Key?

    • The free tier only supports text generation and does not support image generation.
  4. 503 Error or Retry Error prompted during content generation

    • You can check the Docker backend logs using the commands in the README to locate the detailed error for the 503 issue. This is generally caused by incorrect model configuration.
  5. Why does the API Key set in .env not take effect?

    • After editing the .env file during runtime, you need to restart the Docker container to apply the changes.
    • If parameters were previously configured on the web settings page, they will override the values in .env. You can revert to the .env settings by selecting "Restore Default Settings".

🤝 Contributing Guide

Welcome to contribute to this project via Issue and Pull Request!

Important: Please read CONTRIBUTING.md before contributing.

📄 License

This project is open-sourced under the GNU Affero General Public License v3.0 (AGPL-3.0) and can be freely used for non-commercial purposes such as personal learning, research, experimentation, education, or non-profit scientific research.

Details A Commercial License is required for commercial use (e.g., closed-source use, private deployment and delivery, integrating this project into closed-source products, or providing services without disclosing the corresponding source code). Please contact the author: anionex@qq.com - Contact: anionex@qq.com

🚀 Sponsor


AIHubMix

Thanks to AIHubMix for sponsoring this project


image

Thanks to AI Fire for sponsoring this project "Aggregating global multi-model API service providers. Enjoy secure, stable, and 24/7 access to the world's latest models at lower prices."

Acknowledgements

  • Project Contributors:

Contributors

Support

Open source is not easy 🙏 If this project is valuable to you, feel free to buy the developer a coffee ☕️

image

Thanks to the following friends for their voluntary sponsorship and support:

@雅俗共赏, @曹峥, @以年观日, @John, @胡yun星Ethan, @azazo1, @刘聪NLP, @🍟, @苍何, @万瑾, @biubiu, @law, @方源, @寒松Falcon If you have any questions about the sponsorship list, please contact the author

📈 Project Statistics

Star History Chart