Presentation generator images by dean2727 · Pull Request #198 · vibing-ai/marvel-ai-backend

dean2727 · 2025-03-21T16:01:12Z

Description

This PR introduces image generation capabilities into the existing (updated) presentation generator. This feature enhances the presentations generated for the teachers by reducing the need to find relevant and effective visuals for slides.

See Loom demo here: https://www.loom.com/share/7ebebeb92f2f41f9a6fdc5665e1fe8b5?sid=c9a0dc8c-aa1e-4742-a1dc-3d5212814215

Type of Change

Please select the type(s) of change that apply and delete those that do not.

Proposed Solution

A new folder, image_generator, was created under presentation_generator_updated, which contains the new functionality (note: no new route was added to router.py, since the functionality is intended to be called in SlideGenerator. Hence, the image generator's executor exists only for tests).

On a given generated slide, 2 new attributes, needs_image and image_url, are determined. First, an LLM (Google Gemini 1.5 flash) determines needs_image, and if true, the image_url for the slide, which is a publicly accessible Google Cloud Storage link, is created on a successful image creation. The prompt for the image model is also dictated by an LLM (Google Gemini 1.5 flash) , using slide title and content. If image generation fails, it tries again 1 more time, and if the second time fails, a fallback image/URL is returned (note: this URL has yet to be determined). Errors and logs are handled gracefully and thoroughly on a given generate_slide_image() call to the new ImageGenerator.

There are 2 possible image generation models to use which can be configured in the code (though not passed in by the frontend): Flux 1.1 (default) and Imagen 3. Images are generated for slides synchronously, with around 3-3.3 seconds needed for each image (Flux model).

Explanation of design choices

Task	Technology	Reason
Image Generation	Imagen 3, Flux 1.1 (default)	Flux generation speeds are faster than imagen and displays text better
Backend API	FastAPI (Python)	Fast and scalable API framework
Image generation prompt (from slide content)	LLM	Covers the most amount of cases and avoids hardcoding values
Image generation classification	LLM	Circumvents the need to train a classification model, which requires many training examples and would be less accurate
Storage	Google Cloud Storage	Cheaper, scales better, serves files faster, and integrates with CDNs (compared to Firestore)

How to Test

To test the new unit tests:

cd into image_generator/tests/
Run pytest test_core.py

Unit Tests

Slide generator tests

test_executor() - Updated to handle new response format with "data" wrapper
test_executor_missing_inputs() - Unchanged
test_executor_loader_error() - Updated to expect ToolExecutorError
test_executor_unexpected_error() - Unchanged
test_validate_slides_content() - Unchanged
test_validate_slides_content_with_garbage() - Unchanged
test_validate_slides_content_empty_slides() - Unchanged
test_slide_generator_compile_context() - Updated to handle new chain tuple return
test_slide_model() - Updated to include needs_image and image_url fields
test_slide_presentation_model() - Updated to include needs_image and image_url fields
test_image_determination() - NEW - Tests the new image determination functionality

Image generator tests

Prompt Construction Tests:
test_construct_image_generation_prompt_string_content() - Tests prompt construction with string content
test_construct_image_generation_prompt_list_content() - Tests prompt construction with list content
test_construct_image_generation_prompt_dict_content() - Tests prompt construction with dictionary content
test_construct_image_generation_prompt_error_handling() - Tests error handling in prompt generation

Image Generation Tests:
test_generate_slide_image_flux() - Tests image generation with flux model
test_generate_slide_image_imagen() - Tests image generation with imagen model
test_generate_slide_image_retry() - Tests retry mechanism on failure
test_generate_slide_image_all_attempts_fail() - Tests fallback to placeholder when all attempts fail

Executor Function Tests:
test_executor_successful() - Tests successful execution
test_executor_missing_inputs() - Tests handling of missing required inputs
test_executor_loader_error() - Tests handling of LoaderError
test_executor_general_exception() - Tests handling of general exceptions

Documentation Updates

Indicate whether documentation needs to be updated due to this PR.

[] Yes
No

Checklist

I have performed a self-review of my code.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have added tests that prove my fix is effective or that my feature works.
New and existing unit tests pass locally with my changes.
Any dependent changes have been merged and published in downstream modules.

Additional Information

A new environment variable, REPLICATE_API_TOKEN, must be set prior to testing (see here for token API page)
Take note of the TODO comment, which mentions that, if this PR is approved, the fallback image for failed image generation would need to be added to GCS, with URL updated

… prompt correctly

…stability

…ansformation Epic 2.6 query transformation

Assistant generated file changes: - app/tools/multiple_choice_quiz_generator/metadata.json: Add quiz description field - app/tools/multiple_choice_quiz_generator/prompt/multiple_choice_quiz_generator_prompt.txt: Update prompt with quiz description - app/tools/multiple_choice_quiz_generator/core.py: Add LangChain integration --- User prompt: i want to do a few taks :Re-benchmarking,Input set testing on the branch, Add a new "Quiz Description" field,Update the instructional prompt, add langchain using these credentials :LANGSMITH_ENDPOINT="https://api.smith.langchain.com" LANGSMITH_API_KEY="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" LANGSMITH_PROJECT="pr-majestic-mining-82" OPENAI_API_KEY="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 and collaboate using this site "https://www.notion.so/Steven-H-190456247e0a8024b9daedbcfaeabcdc"

…ement.

…nsive instructions Assistant generated file changes: - app/tools/multiple_choice_quiz_generator/prompt/multiple_choice_quiz_generator_prompt.txt: Update quiz generator prompt with new comprehensive instructions --- User prompt: for the instructional prompt i want it to say " ### Multiple Choice Quiz Generator Prompt You are an AI-powered Multiple Choice Quiz Generator. Your task is to create a quiz based on user-defined topics. Follow these instructions carefully: 1. **Quiz Structure** - Generate a set of multiple-choice questions. - Each question should have four answer options (A, B, C, and D). - Clearly indicate the correct answer. 2. **User Input** - Allow users to specify the number of questions. - Users can input their own questions and answer choices or request auto-generated ones. 3. **Interactive Quiz Execution** - Present each question one at a time. - Accept user responses (A, B, C, or D). - Provide immediate feedback on whether the answer was correct or incorrect. 4. **Scoring and Results** - Keep track of correct and incorrect answers. - Display a final score at the end of the quiz. - Optionally, show correct answers for any missed questions. 5. **Customization Options (Optional Enhancements)** - Support different difficulty levels. - Allow timed quizzes. - Enable saving and loading of question banks. 6. **Answer Key** - At the end of the quiz, generate an answer key displaying each question, the correct answer, and the user's response. - Example format: ``` 1. What is the capital of France? Correct Answer: C) Paris Your Answer: B) Madrid (Incorrect) ``` 7. **State Standard Alignment** - Include a section for specifying relevant state educational standards (e.g., TEKS, Common Core). - Example format: ``` State Standard: TEKS 5.4A - The student applies mathematical process standards to develop an understanding of patterns. ```

Assistant generated file changes: - app/tools/multiple_choice_quiz_generator/tools.py: Add voting fields to QuizQuestion model, Add voting method to QuizBuilder - app/tools/multiple_choice_quiz_generator/tests/test_core.py: Add voting test cases --- User prompt: I want to add a feature where the user can vote with eitther a thumbs up or thumbs down for the MCQ Generated Quiz

Assistant generated file changes: - app/tools/multiple_choice_quiz_generator/metadata.json: Update quiz description field --- User prompt: please update the quiz description field with this "This multiple-choice quiz is designed to assess knowledge on a selected topic while aligning with state educational standards. Each question includes four answer choices (A, B, C, and D), with immediate feedback on correctness. At the end of the quiz, participants will receive a score report along with an answer key. The quiz can be customized to include user-defined questions and relevant state standards (e.g., TEKS, Common Core). Optional enhancements include timed questions, difficulty levels, and question bank storage. "

Assistant generated file changes: - app/tools/multiple_choice_quiz_generator/tests/test_core.py: Add vocabulary quiz test --- User prompt: i want to test this with this "### **Example 5: Vocabulary Quiz** 1. **Topic:** Vocabulary - Science Terms 2. **Grade Level:** 7th Grade 3. **Number of Questions:** 10 4. **Attachment File:** [Science_Glossary.pdf](attachment:11f3af4e-09e3-4384-a310-1018d784bdfa:Science_Glossary.pdf) 5. **Description of Assessment:** - Vocabulary quiz focusing on scientific terms related to physics, chemistry, and biology. Include definitions and context-based questions."

Assistant generated file changes: - run_quiz.py: Add script to run quiz generator --- User prompt: how do i see the quiz output

Assistant generated file changes: - .replit: Update run configuration to use correct app path --- User prompt: Running the code in my project results in the following: ``` Traceback (most recent call last): File "/home/runner/workspace/.pythonlibs/bin/uvicorn", line 5, in <module> from uvicorn.main import main ModuleNotFoundError: No module named 'uvicorn' ``` If applicable, propose a fix immediately.

… app module.

Assistant generated file changes: - .replit: Update app module path --- User prompt: Running the code in my project results in the following: ``` ERROR: Error loading ASGI app. Could not import module "undefined". ``` If applicable, propose a fix immediately.

Assistant generated file changes: - run_quiz.py: Create vocabulary quiz script --- User prompt: ### **Example 5: Vocabulary Quiz** 1. **Topic:** Vocabulary - Science Terms 2. **Grade Level:** 7th Grade 3. **Number of Questions:** 10 4. **Attachment File:** [Science_Glossary.pdf](attachment:11f3af4e-09e3-4384-a310-1018d784bdfa:Science_Glossary.pdf) 5. **Description of Assessment:** - Vocabulary quiz focusing on scientific terms related to physics, chemistry, and biology. Include definitions and context-based questions.

…s. Images generated after slide generation & validation

…tool new syllabus tool restore

… according to model

…rate-slide-image POST request (also fix on get_docs to include gdoc)

…e code duplication)

…ave had to pass in the absolute path to the function, so the function would have just been a with open() and read())

…s. Images generated after slide generation & validation

… according to model

…within presentation generator, which slide generator will use

…an2727/marvel-ai-backend into presentation-generator-images

…tion functionality

…esponse (redundant "data" key)

buriihenry

Superb work. I know you are preloading the Json payload locally in your code for generating slide images. Could you test it on FastAPI via Submit tool?. This would be the best approach give how we have been testing other tools.

dean2727 · 2025-03-26T02:23:20Z

Thank you, Henry! It has been tested and works.

buriihenry · 2025-03-27T08:35:02Z

Please share Json payload for testing?

buriihenry

Share the Json object for testing

dean2727 · 2025-03-28T00:31:04Z

See the following payload:

{
    "user": {
        "id": "string",
        "fullName": "string",
        "email": "string"
    },
    "type": "tool",
    "tool_data": {
        "tool_id": "slide-generator",
        "inputs": [
            {
                "name": "instructional_level",
                "value": "intermediate"
            },
            {
                "name": "topic",
                "value": "Science"
            },
            {
                "name": "slides_titles",
                "value": ["Introduction: The Ever-Cycling Water", "Evaporation and Transpiration: Waters Journey Up", "Condensation and Precipitation: Waters Descent", "Collection and Storage: Where Water Resides", "The Water Cycle in Action: Real-World Examples (Interactive)", "Summary: The Importance of the Water Cycle"]
            },
            {
                "name": "lang",
                "value": "en"
            }
        ]
    }
}

stevenrayhinojosa-gmail-com · 2025-05-04T02:27:56Z

I've thoroughly tested your PR for the presentation generator image integration, and I'm impressed with the quality and robustness of your implementation. The code is well-structured, thoroughly tested, and includes excellent error handling.

Strengths
Comprehensive Implementation: Your image generator provides a complete solution with support for both Flux and Imagen models, giving flexibility in image generation approaches.
Excellent Error Handling: The implementation includes robust error handling with a retry mechanism and fallback to placeholder images when generation fails.
Well-Designed Architecture: The separation of concerns between prompt construction and image generation is clean and maintainable.
Thorough Test Coverage: The test suite is comprehensive, covering all aspects of the implementation including error cases and edge conditions.
Smart Prompt Engineering: Using Gemini to generate image prompts based on slide content is a clever approach that should produce better results than direct prompting.
Cloud Storage Integration: The integration with Google Cloud Storage for image hosting is well-implemented and includes proper error handling.
Detailed Documentation: The code is well-documented with clear docstrings and comments explaining the purpose and behavior of each component.
Areas for Improvement
Missing Metadata: The metadata.json file is empty, which would need to be completed before this can be properly registered as a tool.
Not Registered in tools_config.json: The image generator is not yet registered in the tools_config.json file.
No Integration with Slide Generator: While the image generator works well as a standalone tool, it's not yet integrated with the slide generator to automatically add images to slides.
Environment Variables: The implementation relies on environment variables (GCP_PROJECT_ID) that may not be set in all environments.
Recommendations
Complete Metadata: Add the required fields to the metadata.json file to properly register this as a tool.
Register in tools_config.json: Add an entry for the image generator in the tools_config.json file.
Integrate with Slide Generator: Consider adding code to the slide generator to automatically call the image generator for each slide.
Environment Variable Fallbacks: Add fallbacks or clear error messages when required environment variables are missing.
Add Documentation: Consider adding a README.md file explaining how to use the image generator and how it integrates with the presentation generator.
Conclusion
This is a high-quality PR that adds valuable functionality to the presentation generator. The image generation implementation is robust, well-tested, and should provide excellent results. With a few minor additions to complete the integration, this PR would be ready for merging.

Great work on implementing this feature! The attention to detail in error handling and the comprehensive test suite are particularly impressive.

Bruna Costa Lacerda Silva and others added 30 commits January 20, 2025 11:49

FIX for repeating questions

8f475ec

Fix in the json_schema

455af20

Improved Instructions of the Prompt

8f86f62

Parcial commit: Enhacing user query

06eb474

partial refactor

9f08f11

Changing model parameter: max_output_tokens

1584555

Improved the instructions of the Prompt& updated tools.py to read the…

e2ad4cb

… prompt correctly

Solving conflicts from pull request

b02ba1f

Merge branch 'buriihenry-ai-squad-1' into epic-2.6-query-transformation

11b027a

Cleaning code

2bc22fa

similarity_score_threshold with default score_threshold of 0.4

1197680

refactor: split create_questions into smaller functions for better te…

d79d00c

…stability

Fix validation logic for max_attempts

231afb5

Merge pull request vibing-ai#170 from brunaclacerda/epic-2.6-query-tr…

48ca644

…ansformation Epic 2.6 query transformation

Add user inputs and improve prompts

1eecda9

User checkpoint: Add Replit configuration files for Nix package manag…

90ca80f

…ement.

User checkpoint: Add thumbs up/down voting feature to MCQ quiz.

d6d5200

User checkpoint: Add Science Glossary PDF for vocabulary quiz.

eb4158a

User checkpoint: Update Replit configuration to use Python 3.11

bf3ea03

Assistant checkpoint: Add quiz output viewer

9c63564

Assistant generated file changes: - run_quiz.py: Add script to run quiz generator --- User prompt: how do i see the quiz output

User checkpoint: Update .replit file to launch uvicorn with undefined…

3492798

… app module.

User checkpoint: Configure port forwarding for deployment on Replit.

c8d3136

dean2727 and others added 22 commits March 15, 2025 21:55

image generation API, using image (promt) generation utility function…

b266cdb

…s. Images generated after slide generation & validation

new syllabus tool restore

cfc3212

Merge pull request vibing-ai#194 from marvelai-org/restore-sylllabus-…

31b2c1a

…tool new syllabus tool restore

Tweaks and examples to slide image prompt generation

6a94646

Option to include flux or imagen model in API request, adapt function…

e664639

… according to model

Address complex math topics in prompt for prompt for generating images

5ec628f

Start per-slide images - image_url field in Slide, new route for gene…

42dac11

…rate-slide-image POST request (also fix on get_docs to include gdoc)

add read_text_file() to utils, import in all files that use it (reduc…

3a9380a

…e code duplication)

Remove the unnecessary abstraction layer (read_text_file) (we would h…

f3f7a29

…ave had to pass in the absolute path to the function, so the function would have just been a with open() and read())

outline generator tweaks

f60ebf3

image generation API, using image (promt) generation utility function…

840612d

…s. Images generated after slide generation & validation

Tweaks and examples to slide image prompt generation

8308ee5

Option to include flux or imagen model in API request, adapt function…

a33c91f

… according to model

Address complex math topics in prompt for prompt for generating images

854a619

GCS storage for generated images

36d7da6

Move slide image generation out of utils and into a new class/folder …

8a9cf4a

…within presentation generator, which slide generator will use

Image generator core.py and one retry logic/error handling in the tool

59d5fd1

Merge branch 'presentation-generator-images' of https://github.com/de…

f1f910c

…an2727/marvel-ai-backend into presentation-generator-images

Tests for slide image generation

4c41189

Determine if slide needs an image, with updated tests

639942b

No more route, since image generator itself is called in slide genera…

606d5d1

…tion functionality

Remove log contents that take up much space, fix to image generator r…

a453264

…esponse (redundant "data" key)

buriihenry self-assigned this Mar 22, 2025

This comment was marked as duplicate.

Sign in to view

buriihenry suggested changes Mar 24, 2025

View reviewed changes

buriihenry approved these changes Mar 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Presentation generator images#198

Presentation generator images#198
dean2727 wants to merge 80 commits into
vibing-ai:ai-squad-003from
dean2727:presentation-generator-images

dean2727 commented Mar 21, 2025 •

edited

Loading

Uh oh!

This comment was marked as duplicate.

Uh oh!

buriihenry left a comment

Uh oh!

dean2727 commented Mar 26, 2025

Uh oh!

buriihenry commented Mar 27, 2025

Uh oh!

buriihenry left a comment

Uh oh!

dean2727 commented Mar 28, 2025 •

edited

Loading

Uh oh!

stevenrayhinojosa-gmail-com commented May 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

dean2727 commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Proposed Solution

Explanation of design choices

How to Test

Unit Tests

Slide generator tests

Image generator tests

Documentation Updates

Checklist

Additional Information

Uh oh!

This comment was marked as duplicate.

Uh oh!

buriihenry left a comment

Choose a reason for hiding this comment

Uh oh!

dean2727 commented Mar 26, 2025

Uh oh!

buriihenry commented Mar 27, 2025

Uh oh!

buriihenry left a comment

Choose a reason for hiding this comment

Uh oh!

dean2727 commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevenrayhinojosa-gmail-com commented May 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dean2727 commented Mar 21, 2025 •

edited

Loading

dean2727 commented Mar 28, 2025 •

edited

Loading