Skip to content

Image Generation tool for Presentation Generator (PoC)#197

Open
sairishi-exe wants to merge 3 commits into
vibing-ai:Developfrom
sairishi-exe:feature/presentation-generator-v2
Open

Image Generation tool for Presentation Generator (PoC)#197
sairishi-exe wants to merge 3 commits into
vibing-ai:Developfrom
sairishi-exe:feature/presentation-generator-v2

Conversation

@sairishi-exe
Copy link
Copy Markdown

@sairishi-exe sairishi-exe commented Mar 21, 2025

Description

This PR adds a new Slide Image Generator tool (similar to the Outline and Slide Generator tools) that automatically creates relevant images for presentation slides. The tool analyzes slide content, generates appropriate image prompts, and creates visually appealing images using Vertex AI's Imagen model, all of this is done concurrently. No additional endpoints have been added as Slide Image Generator will be used as a tool.

Type of Change

  • New feature: A non-breaking change that adds functionality.

Proposed Solution

Slide Processing Pipeline

The implementation follows a structured processing pipeline:

  1. Content Analysis: Each slide is analyzed to determine if it needs an image

    • Skips slides like summaries, conclusions, thank you slides, etc.
    • Identifies content-rich slides that would benefit from visual elements
  2. Prompt Generation: For slides needing images, the tool creates safe, effective prompts

    • Uses Gemini Pro to generate contextually relevant image descriptions
    • Applies safety guardrails to ensure appropriate content
  3. Concurrent Processing: Slides are processed in parallel using ThreadPoolExecutor(needs to be evaluated)

    • Each slide is independently analyzed and processed
    • Optimizes throughput when generating multiple images
  4. Image Generation: Prompts are sent to Vertex AI's Imagen model

    • Generates high-quality, contextually relevant images
    • Uses imagen-3.0-fast-generate-001 for fast generation.
    • Handles API interactions and error cases
  5. Storage: Generated images are stored in a public Cloud Storage bucket

    • Creates unique filenames with timestamps and UUIDs
    • Makes images publicly accessible for presentation use

How to Setup and Test

I. Setup on Google Cloud

Firstly, we will need to setup a Google Cloud Service Account and a Public Bucket on Google Cloud Storage. Follow the Loom video below.

Google Cloud Setup Instructions

After successfully running the test_service_account.py file in marvel-ai-backend/app/tools/presentation_generatir_updated/slide_image_generator/tests, you will have completed your Google Cloud setup.

II. Verification before Testing

Now to ensure we have the setups and configs done appropriately, verify that:

  1. Your Dockerfile contains this specific line that provides your container access to your credentials.json file. This means that this container should be accessed by anyone else besides yourself, as this could be a security issue.
# Copy credentials file
COPY marvel-ai-backend-credentials.json /code/marvel-ai-backend-credentials.json

# before this line
COPY ./app /code/app
  1. Your .env file is in app/.env and your marvel-ai-backend-credentials.json is root directory. Your .env file must contain the following variables
# Update credentials path
GOOGLE_APPLICATION_CREDENTIALS=./marvel-ai-backend-credentials.json # in the root directory by this name
SLIDE_IMAGES_BUCKET_NAME=presentation-slide-images # according to your bucket name

  1. Vertex AI API is enabled in your project on Google Cloud.

  2. You have the appropriate Python libraries installed for the Google Cloud SDK

google-cloud-storage>=2.14.0
google-cloud-aiplatform>=1.40.0

III. Testing

  1. In your command line cd to your marvel-ai-backend/ root directory and build your Docker image using the following command
 docker build -t marvel-ai-backend .   
  1. Run your docker container with the .env file explicitly mentioned, like so.
docker run -d -p 8000:8000 \                     
  --env-file ./app/.env \
  --name marvel-ai-backend marvel-ai-backend
  1. Go to your Swagger UI at localhost:8000/docs, and use the /submit-tool endpoint using the following sample schema
{
  "user": {
    "id": "string",
    "fullName": "string",
    "email": "string"
  },
  "type": "tool",
  "tool_data": {
    "tool_id": "slide-image-generator",
    "inputs": [
      {
        "name": "slides",
        "value": [
          {
            "title": "Introduction: Linear Algebra's Role in Machine Learning",
            "template": "sectionHeader",
            "content": "Linear algebra is the foundational mathematics behind many machine learning algorithms. This presentation will cover key concepts and their applications."
          },
          {
            "title": "Vectors and Matrices: Fundamental Building Blocks",
            "template": "titleAndBody",
            "content": "Vectors represent data points, while matrices represent collections of data. Understanding vector operations (addition, scalar multiplication, dot product) and matrix operations (addition, multiplication, transpose) is crucial for manipulating data in machine learning. For example, images can be represented as matrices, and features in a dataset as vectors."
          },
          {
            "title": "Linear Transformations: Manipulating Data",
            "template": "titleAndBullets",
            "content": [
              "Linear transformations are functions that map vectors to other vectors in a linear way.",
              "They are represented by matrices.",
              "Examples include rotation, scaling, and projection.",
              "Crucial for tasks like dimensionality reduction and feature scaling in machine learning."
            ]
          },
          {
            "title": "Eigenvalues and Eigenvectors: Understanding Data Structure",
            "template": "titleAndBody",
            "content": "Eigenvalues and eigenvectors reveal fundamental properties of linear transformations and matrices. Eigenvectors remain unchanged in direction after a transformation, only scaled by the corresponding eigenvalue. In Principal Component Analysis (PCA), eigenvectors represent principal components, which capture the most variance in the data."
          },
          {
            "title": "Singular Value Decomposition (SVD): Dimensionality Reduction in Action",
            "template": "titleAndBody",
            "content": "SVD decomposes a matrix into three smaller matrices, revealing its underlying structure. This is useful for dimensionality reduction techniques like Latent Semantic Analysis (LSA) in natural language processing, reducing the number of dimensions while preserving essential information."
          },
          {
            "title": "Real-World Applications: Examples in Machine Learning Algorithms",
            "template": "twoColumn",
            "content": {
              "leftColumn": "Recommendation Systems: Collaborative filtering uses matrix factorization techniques based on SVD to predict user preferences.",
              "rightColumn": "Image Compression: SVD can be used to reduce the size of image files by approximating the original matrix with a lower-rank matrix."
            }
          },
          {
            "title": "Hands-on Exercise: Linear Algebra in Python (e.g., using NumPy)",
            "template": "titleAndBullets",
            "content": [
              "Use NumPy to create vectors and matrices.",
              "Perform basic vector and matrix operations.",
              "Implement a simple linear transformation.",
              "Analyze a small dataset using PCA."
            ]
          },
          {
            "title": "Summary and Conclusion: Key Takeaways and Further Exploration",
            "template": "titleAndBody",
            "content": "This presentation provided an overview of essential linear algebra concepts for machine learning. Further exploration of topics like matrix decompositions, vector spaces, and norms is recommended for a deeper understanding."
          }
        ]
      }
    ]
  }
}

Functionality Showcase

Example Usage of the Slide Image Generator

Unit Tests

No unit tests have been implemented, but you can check out the test file found at marvel-ai-backend/app/tools/presentation_generator_updated/slide_image_generator/tests/test_service_account.py, to test basic Google Cloud Service Account usage.

Future Enhancements

Potential future improvements (not included in this PR):

  • Connection pooling for improved performance
  • In-memory caching or persistent caching like Reds to avoid regenerating similar images
  • Async processing for non-blocking API responses and tool execution (can also explore Celery)
  • More sophisticated image style control

Documentation Updates

  • Yes

Documentation needs to be updated to include:

  • Tool usage instructions
  • Expected input/output formats
  • Configuration requirements (environment variables)

Checklist

  • I have performed a self-review of my code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • Any dependent changes have been merged and published in downstream modules.

Additional Information

Detailed Notion Document WIP

@sairishi-exe
Copy link
Copy Markdown
Author

@buriihenry

@buriihenry buriihenry self-assigned this Mar 22, 2025
Copy link
Copy Markdown
Contributor

@buriihenry buriihenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superb work.. Would be good to write unit tests for the slide image generator

@sairishi-exe
Copy link
Copy Markdown
Author

Superb work.. Would be good to write unit tests for the slide image generator

Yes working on some unit tests will get them out by tonight

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants