Support vision input for Planner #472

liqul · 2025-03-11T09:30:51Z

Modified message formatting to support vision input for OpenAI API
Added a role ImageReader to process input images so the Planner can get the URL/content of the image

Copilot

PR Overview

This pull request adds support for vision input in the Planner by updating message formatting and introducing a new role for processing image inputs.

Updated Chat message types and introduced a helper for constructing content messages with image URLs.
Created a new ImageReader role with corresponding configuration to read images and convert local paths into data URLs.
Modified Planner and attachment handling to incorporate image attachment processing.

Reviewed Changes

File	Description
taskweaver/llm/util.py	Updated ChatMessage and added format_chat_message_content for vision input support.
taskweaver/ext_role/image_reader/image_reader.py	Added new ImageReader role to process image paths and generate data URLs.
taskweaver/ext_role/image_reader/image_reader.role.yaml	Added YAML configuration for the ImageReader role.
taskweaver/planner/planner.py	Modified conversation composition to include image attachments.
taskweaver/memory/attachment.py	Extended AttachmentType enum to include image_url for vision input.

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

…sers/liqun/vision

Copilot

Pull Request Overview

This pull request adds support for vision input to the Planner role by introducing a new ImageReader role that can process image paths provided in messages and converts local images to Base64 data URLs for downstream consumption. Additional changes include updates to message formatting functions to support image URLs, various markdown documentation updates, and adjustments in attachment handling.

Reviewed Changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
taskweaver/ext_role/image_reader/image_reader.py	New ImageReader role implementation to process image inputs
taskweaver/llm/util.py	Updated chat message formatting to include image URL support
website/blog/authors.yml	Added author details
taskweaver/ext_role/image_reader/image_reader.role.yaml	Defined role configuration for ImageReader
taskweaver/planner/planner.py	Updated logic to process attachments with image URLs
taskweaver/code_interpreter/code_generator.py	Adjusted attachment handling to use the content field
website/blog/local_llm.md, evaluation.md, experience.md	Updated markdown front matter and content
taskweaver/memory/attachment.py & post.py	Updated AttachmentType and get_attachment return type for image_url support
taskweaver/code_interpreter/*, chat/console/chat.py	Minor content and messaging updates throughout the codebase
README.md	Updated news section to reflect vision input support
taskweaver/code_interpreter/code_interpreter_cli_only/code_interpreter_cli_only.py	Adjusted attachment handling for reply content

Comments suppressed due to low confidence (1)

taskweaver/memory/post.py:90

The get_attachment function now returns Attachment objects instead of their content strings; verify that calling code in other modules properly accesses the 'content' attribute where required.

def get_attachment(self, type: AttachmentType) -> List[Attachment]:

taskweaver/ext_role/image_reader/image_reader.py

Co-authored-by: Copilot <[email protected]>

Copilot

Pull Request Overview

This PR adds vision input support for the Planner role by modifying chat message formatting and introducing a new ImageReader role to process image inputs. Key changes include updating message formatting in llm/util.py, adding a new image conversion and reader implementation in ext_role/image_reader, and propagating image attachments support throughout the codebase.

Reviewed Changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
taskweaver/llm/util.py	Updated chat message formatting to support image URLs via a new content structure
taskweaver/ext_role/image_reader/image_reader.py	Added ImageReader role with local image-to-data URL conversion and image processing logic
website/blog/authors.yml	Added author metadata for new contributors
taskweaver/planner/planner.py	Updated conversation composition to handle image attachments
website/blog/local_llm.md, evaluation.md, experience.md	Introduced/upgraded YAML front matter for blog posts
taskweaver/code_interpreter/code_interpreter/code_generator.py	Fixed extraction of attachment content for code feedback
taskweaver/memory/attachment.py	Added new attachment type “image_url”
taskweaver/code_interpreter/code_interpreter_plugin_only/code_interpreter_plugin_only.py	Updated extraction of function attachment content
README.md	Updated news section to announce vision input support
taskweaver/code_interpreter/code_interpreter_cli_only/code_interpreter_cli_only.py	Adjusted extraction of reply content from attachments
taskweaver/chat/console/chat.py	Minor update to the system message for new sessions
taskweaver/memory/post.py	Updated get_attachment to return Attachment objects instead of their content

Comments suppressed due to low confidence (1)

taskweaver/ext_role/image_reader/image_reader.py:31

Consider replacing print statements with logger.error calls to ensure that error messages are correctly captured in production logs.

print(f"Error: The file {image_path} does not exist.")

taskweaver/ext_role/image_reader/image_reader.py

Co-authored-by: Copilot <[email protected]>

add image reader

958283a

liqul requested a review from Copilot March 11, 2025 09:30

Copilot AI reviewed Mar 11, 2025

View reviewed changes

liqul added 11 commits March 11, 2025 18:11

refactor for clear

27baa03

Merge branch 'main' of https://github.com/microsoft/TaskWeaver into u…

ebe185c

…sers/liqun/vision

image message display

4cc25cb

remove progagate

1f87ca4

rename

634f4b6

rename

f1f305e

change default roles

06e2c80

reset default roles

50f3232

add doc

586324a

ujpdate doc

fccf2fa

update readme

e27a9ca

liqul requested review from Copilot, Jack-Q and ShilinHe March 14, 2025 08:28

Copilot AI reviewed Mar 14, 2025

View reviewed changes

taskweaver/ext_role/image_reader/image_reader.py Outdated Show resolved Hide resolved

Update taskweaver/ext_role/image_reader/image_reader.py

26f2848

Co-authored-by: Copilot <[email protected]>

liqul requested a review from Copilot March 14, 2025 08:35

Copilot AI reviewed Mar 14, 2025

View reviewed changes

taskweaver/ext_role/image_reader/image_reader.py Outdated Show resolved Hide resolved

Update taskweaver/ext_role/image_reader/image_reader.py

b5b8d12

Co-authored-by: Copilot <[email protected]>

Jack-Q approved these changes Mar 14, 2025

View reviewed changes

liqul merged commit f605b97 into main Mar 14, 2025
2 checks passed

liqul deleted the users/liqun/vision branch March 14, 2025 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support vision input for Planner #472

Support vision input for Planner #472

Uh oh!

liqul commented Mar 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Support vision input for Planner #472

Support vision input for Planner #472

Uh oh!

Conversation

liqul commented Mar 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

PR Overview

Reviewed Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!