Skip to content

Presentation generator images#198

Open
dean2727 wants to merge 80 commits into
vibing-ai:ai-squad-003from
dean2727:presentation-generator-images
Open

Presentation generator images#198
dean2727 wants to merge 80 commits into
vibing-ai:ai-squad-003from
dean2727:presentation-generator-images

Conversation

@dean2727
Copy link
Copy Markdown

@dean2727 dean2727 commented Mar 21, 2025

Description

This PR introduces image generation capabilities into the existing (updated) presentation generator. This feature enhances the presentations generated for the teachers by reducing the need to find relevant and effective visuals for slides.

See Loom demo here: https://www.loom.com/share/7ebebeb92f2f41f9a6fdc5665e1fe8b5?sid=c9a0dc8c-aa1e-4742-a1dc-3d5212814215

Type of Change

Please select the type(s) of change that apply and delete those that do not.

  • Bug fix: A non-breaking change that fixes an issue.
  • New feature: A non-breaking change that adds functionality.
  • Breaking change: A change that causes existing functionality to not work as expected.
  • Documentation update: Changes or updates to documentation.
  • Code style update: Changes that do not affect the meaning of the code (e.g., formatting).
  • Refactoring: A code change that neither fixes a bug nor adds a feature.
  • Performance improvement: A change that improves performance.
  • Test enhancement: Adding or updating tests; no production code change.
  • Chore: Changes to the build process or auxiliary tools; no production code change.
  • Other: (please describe)

Proposed Solution

A new folder, image_generator, was created under presentation_generator_updated, which contains the new functionality (note: no new route was added to router.py, since the functionality is intended to be called in SlideGenerator. Hence, the image generator's executor exists only for tests).

On a given generated slide, 2 new attributes, needs_image and image_url, are determined. First, an LLM (Google Gemini 1.5 flash) determines needs_image, and if true, the image_url for the slide, which is a publicly accessible Google Cloud Storage link, is created on a successful image creation. The prompt for the image model is also dictated by an LLM (Google Gemini 1.5 flash) , using slide title and content. If image generation fails, it tries again 1 more time, and if the second time fails, a fallback image/URL is returned (note: this URL has yet to be determined). Errors and logs are handled gracefully and thoroughly on a given generate_slide_image() call to the new ImageGenerator.

There are 2 possible image generation models to use which can be configured in the code (though not passed in by the frontend): Flux 1.1 (default) and Imagen 3. Images are generated for slides synchronously, with around 3-3.3 seconds needed for each image (Flux model).

Explanation of design choices

Task Technology Reason
Image Generation Imagen 3, Flux 1.1 (default) Flux generation speeds are faster than imagen and displays text better
Backend API FastAPI (Python) Fast and scalable API framework
Image generation prompt (from slide content) LLM Covers the most amount of cases and avoids hardcoding values
Image generation classification LLM Circumvents the need to train a classification model, which requires many training examples and would be less accurate
Storage Google Cloud Storage Cheaper, scales better, serves files faster, and integrates with CDNs (compared to Firestore)

How to Test

To test the new unit tests:

  1. cd into image_generator/tests/
  2. Run pytest test_core.py

Unit Tests

Slide generator tests

test_executor() - Updated to handle new response format with "data" wrapper
test_executor_missing_inputs() - Unchanged
test_executor_loader_error() - Updated to expect ToolExecutorError
test_executor_unexpected_error() - Unchanged
test_validate_slides_content() - Unchanged
test_validate_slides_content_with_garbage() - Unchanged
test_validate_slides_content_empty_slides() - Unchanged
test_slide_generator_compile_context() - Updated to handle new chain tuple return
test_slide_model() - Updated to include needs_image and image_url fields
test_slide_presentation_model() - Updated to include needs_image and image_url fields
test_image_determination() - NEW - Tests the new image determination functionality

Image generator tests

Prompt Construction Tests:
test_construct_image_generation_prompt_string_content() - Tests prompt construction with string content
test_construct_image_generation_prompt_list_content() - Tests prompt construction with list content
test_construct_image_generation_prompt_dict_content() - Tests prompt construction with dictionary content
test_construct_image_generation_prompt_error_handling() - Tests error handling in prompt generation

Image Generation Tests:
test_generate_slide_image_flux() - Tests image generation with flux model
test_generate_slide_image_imagen() - Tests image generation with imagen model
test_generate_slide_image_retry() - Tests retry mechanism on failure
test_generate_slide_image_all_attempts_fail() - Tests fallback to placeholder when all attempts fail

Executor Function Tests:
test_executor_successful() - Tests successful execution
test_executor_missing_inputs() - Tests handling of missing required inputs
test_executor_loader_error() - Tests handling of LoaderError
test_executor_general_exception() - Tests handling of general exceptions

Documentation Updates

Indicate whether documentation needs to be updated due to this PR.

  • [] Yes
  • No

Checklist

  • I have performed a self-review of my code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • Any dependent changes have been merged and published in downstream modules.

Additional Information

  • A new environment variable, REPLICATE_API_TOKEN, must be set prior to testing (see here for token API page)
  • Take note of the TODO comment, which mentions that, if this PR is approved, the fallback image for failed image generation would need to be added to GCS, with URL updated

Bruna Costa Lacerda Silva and others added 30 commits January 20, 2025 11:49
…ansformation

Epic 2.6 query transformation
Assistant generated file changes:
- app/tools/multiple_choice_quiz_generator/metadata.json: Add quiz description field
- app/tools/multiple_choice_quiz_generator/prompt/multiple_choice_quiz_generator_prompt.txt: Update prompt with quiz description
- app/tools/multiple_choice_quiz_generator/core.py: Add LangChain integration

---

User prompt:

i want to do a few taks :Re-benchmarking,Input set testing on the branch, Add a new "Quiz Description" field,Update the instructional prompt, add langchain using these credentials :LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGSMITH_API_KEY="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"
LANGSMITH_PROJECT="pr-majestic-mining-82"
OPENAI_API_KEY="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
and collaboate using this site "https://www.notion.so/Steven-H-190456247e0a8024b9daedbcfaeabcdc"
…nsive instructions

Assistant generated file changes:
- app/tools/multiple_choice_quiz_generator/prompt/multiple_choice_quiz_generator_prompt.txt: Update quiz generator prompt with new comprehensive instructions

---

User prompt:

for the instructional prompt i want  it to say " ### Multiple Choice Quiz Generator Prompt

You are an AI-powered Multiple Choice Quiz Generator. Your task is to create a quiz based on user-defined topics. Follow these instructions carefully:

1. **Quiz Structure**
   - Generate a set of multiple-choice questions.
   - Each question should have four answer options (A, B, C, and D).
   - Clearly indicate the correct answer.

2. **User Input**
   - Allow users to specify the number of questions.
   - Users can input their own questions and answer choices or request auto-generated ones.

3. **Interactive Quiz Execution**
   - Present each question one at a time.
   - Accept user responses (A, B, C, or D).
   - Provide immediate feedback on whether the answer was correct or incorrect.

4. **Scoring and Results**
   - Keep track of correct and incorrect answers.
   - Display a final score at the end of the quiz.
   - Optionally, show correct answers for any missed questions.

5. **Customization Options (Optional Enhancements)**
   - Support different difficulty levels.
   - Allow timed quizzes.
   - Enable saving and loading of question banks.

6. **Answer Key**
   - At the end of the quiz, generate an answer key displaying each question, the correct answer, and the user's response.
   - Example format:
     ```
     1. What is the capital of France?
        Correct Answer: C) Paris
        Your Answer: B) Madrid (Incorrect)
     ```

7. **State Standard Alignment**
   - Include a section for specifying relevant state educational standards (e.g., TEKS, Common Core).
   - Example format:
     ```
     State Standard: TEKS 5.4A - The student applies mathematical process standards to develop an understanding of patterns.
     ```
Assistant generated file changes:
- app/tools/multiple_choice_quiz_generator/tools.py: Add voting fields to QuizQuestion model, Add voting method to QuizBuilder
- app/tools/multiple_choice_quiz_generator/tests/test_core.py: Add voting test cases

---

User prompt:

I want to add a feature where the user can vote with eitther a thumbs up or thumbs down for the MCQ Generated Quiz
Assistant generated file changes:
- app/tools/multiple_choice_quiz_generator/metadata.json: Update quiz description field

---

User prompt:

please update the quiz description field with this "This multiple-choice quiz is designed to assess knowledge on a selected topic while aligning with state educational standards. Each question includes four answer choices (A, B, C, and D), with immediate feedback on correctness. At the end of the quiz, participants will receive a score report along with an answer key. The quiz can be customized to include user-defined questions and relevant state standards (e.g., TEKS, Common Core). Optional enhancements include timed questions, difficulty levels, and question bank storage. "
Assistant generated file changes:
- app/tools/multiple_choice_quiz_generator/tests/test_core.py: Add vocabulary quiz test

---

User prompt:

i want to test this with this "### **Example 5: Vocabulary Quiz**

1. **Topic:** Vocabulary - Science Terms
2. **Grade Level:** 7th Grade
3. **Number of Questions:** 10
4. **Attachment File:**

    [Science_Glossary.pdf](attachment:11f3af4e-09e3-4384-a310-1018d784bdfa:Science_Glossary.pdf)

5. **Description of Assessment:**
    - Vocabulary quiz focusing on scientific terms related to physics, chemistry, and biology. Include definitions and context-based questions."
Assistant generated file changes:
- run_quiz.py: Add script to run quiz generator

---

User prompt:

how do i see the quiz output
Assistant generated file changes:
- .replit: Update run configuration to use correct app path

---

User prompt:

Running the code in my project results in the following:

```
Traceback (most recent call last):
  File "/home/runner/workspace/.pythonlibs/bin/uvicorn", line 5, in <module>
    from uvicorn.main import main
ModuleNotFoundError: No module named 'uvicorn'

```

If applicable, propose a fix immediately.
Assistant generated file changes:
- .replit: Update app module path

---

User prompt:

Running the code in my project results in the following:

```
ERROR:    Error loading ASGI app. Could not import module "undefined".

```

If applicable, propose a fix immediately.
Assistant generated file changes:
- run_quiz.py: Create vocabulary quiz script

---

User prompt:

### **Example 5: Vocabulary Quiz**

1. **Topic:** Vocabulary - Science Terms
2. **Grade Level:** 7th Grade
3. **Number of Questions:** 10
4. **Attachment File:**

    [Science_Glossary.pdf](attachment:11f3af4e-09e3-4384-a310-1018d784bdfa:Science_Glossary.pdf)

5. **Description of Assessment:**
    - Vocabulary quiz focusing on scientific terms related to physics, chemistry, and biology. Include definitions and context-based questions.
dean2727 and others added 22 commits March 15, 2025 21:55
…s. Images generated after slide generation & validation
…rate-slide-image POST request (also fix on get_docs to include gdoc)
…ave had to pass in the absolute path to the function, so the function would have just been a with open() and read())
…s. Images generated after slide generation & validation
…within presentation generator, which slide generator will use
@buriihenry buriihenry self-assigned this Mar 22, 2025
buriihenry

This comment was marked as duplicate.

Copy link
Copy Markdown
Contributor

@buriihenry buriihenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superb work. I know you are preloading the Json payload locally in your code for generating slide images. Could you test it on FastAPI via Submit tool?. This would be the best approach give how we have been testing other tools.

@dean2727
Copy link
Copy Markdown
Author

Thank you, Henry! It has been tested and works.

@buriihenry
Copy link
Copy Markdown
Contributor

Please share Json payload for testing?

Copy link
Copy Markdown
Contributor

@buriihenry buriihenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Share the Json object for testing

@dean2727
Copy link
Copy Markdown
Author

dean2727 commented Mar 28, 2025

See the following payload:

{
    "user": {
        "id": "string",
        "fullName": "string",
        "email": "string"
    },
    "type": "tool",
    "tool_data": {
        "tool_id": "slide-generator",
        "inputs": [
            {
                "name": "instructional_level",
                "value": "intermediate"
            },
            {
                "name": "topic",
                "value": "Science"
            },
            {
                "name": "slides_titles",
                "value": ["Introduction: The Ever-Cycling Water", "Evaporation and Transpiration: Waters Journey Up", "Condensation and Precipitation: Waters Descent", "Collection and Storage: Where Water Resides", "The Water Cycle in Action: Real-World Examples (Interactive)", "Summary: The Importance of the Water Cycle"]
            },
            {
                "name": "lang",
                "value": "en"
            }
        ]
    }
}

@stevenrayhinojosa-gmail-com
Copy link
Copy Markdown
Contributor

I've thoroughly tested your PR for the presentation generator image integration, and I'm impressed with the quality and robustness of your implementation. The code is well-structured, thoroughly tested, and includes excellent error handling.

Strengths
Comprehensive Implementation: Your image generator provides a complete solution with support for both Flux and Imagen models, giving flexibility in image generation approaches.
Excellent Error Handling: The implementation includes robust error handling with a retry mechanism and fallback to placeholder images when generation fails.
Well-Designed Architecture: The separation of concerns between prompt construction and image generation is clean and maintainable.
Thorough Test Coverage: The test suite is comprehensive, covering all aspects of the implementation including error cases and edge conditions.
Smart Prompt Engineering: Using Gemini to generate image prompts based on slide content is a clever approach that should produce better results than direct prompting.
Cloud Storage Integration: The integration with Google Cloud Storage for image hosting is well-implemented and includes proper error handling.
Detailed Documentation: The code is well-documented with clear docstrings and comments explaining the purpose and behavior of each component.
Areas for Improvement
Missing Metadata: The metadata.json file is empty, which would need to be completed before this can be properly registered as a tool.
Not Registered in tools_config.json: The image generator is not yet registered in the tools_config.json file.
No Integration with Slide Generator: While the image generator works well as a standalone tool, it's not yet integrated with the slide generator to automatically add images to slides.
Environment Variables: The implementation relies on environment variables (GCP_PROJECT_ID) that may not be set in all environments.
Recommendations
Complete Metadata: Add the required fields to the metadata.json file to properly register this as a tool.
Register in tools_config.json: Add an entry for the image generator in the tools_config.json file.
Integrate with Slide Generator: Consider adding code to the slide generator to automatically call the image generator for each slide.
Environment Variable Fallbacks: Add fallbacks or clear error messages when required environment variables are missing.
Add Documentation: Consider adding a README.md file explaining how to use the image generator and how it integrates with the presentation generator.
Conclusion
This is a high-quality PR that adds valuable functionality to the presentation generator. The image generation implementation is robust, well-tested, and should provide excellent results. With a few minor additions to complete the integration, this PR would be ready for merging.

Great work on implementing this feature! The attention to detail in error handling and the comprehensive test suite are particularly impressive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants