Skip to content

Commit f605b97

Browse files
authored
Support vision input for Planner (#472)
- Modified message formatting to support vision input for OpenAI API - Added a role ImageReader to process input images so the Planner can get the URL/content of the image
2 parents 5505890 + b5b8d12 commit f605b97

File tree

26 files changed

+332
-44
lines changed

26 files changed

+332
-44
lines changed

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ Unlike many agent frameworks that only track the chat history with LLMs in text,
2323

2424

2525
## 🆕 News
26+
- 📅2025-03-13: TaskWeaver now supports vision input for the Planner role. Please check the [vision input](https://microsoft.github.io/TaskWeaver/blog/vision) for more details.👀
2627
- 📅2025-01-16: TaskWeaver has been enhanced with an experimental role called [Recepta](https://microsoft.github.io/TaskWeaver/blog/reasoning) for its reasoning power.🧠
2728
- 📅2024-12-23: TaskWeaver has been integrated with the [AgentOps](https://microsoft.github.io/TaskWeaver/docs/observability) for better observability and monitoring.🔍
2829
- 📅2024-09-13: We introduce the shared memory to store information that is shared between the roles in TaskWeaver. Please check the [memory](https://microsoft.github.io/TaskWeaver/docs/memory) for more details.🧠
@@ -31,7 +32,7 @@ Unlike many agent frameworks that only track the chat history with LLMs in text,
3132
- 📅2024-05-07: We have added two blog posts on [Evaluating a LLM agent](https://microsoft.github.io/TaskWeaver/blog/evaluation) and [Adding new roles to TaskWeaver](https://microsoft.github.io/TaskWeaver/blog/role) in the documentation.📝
3233
- 📅2024-03-28: TaskWeaver now offers all-in-one Docker image, providing a convenient one-stop experience for users. Please check the [docker](https://microsoft.github.io/TaskWeaver/docs/usage/docker) for more details.🐳
3334
- 📅2024-03-27: TaskWeaver now switches to `container` mode by default for code execution. Please check the [code execution](https://microsoft.github.io/TaskWeaver/docs/code_execution) for more details.🐳
34-
- 📅2024-03-07: TaskWeaver now supports configuration of different LLMs for various components, such as the Planner and CodeInterpreter. Please check the [multi-llm](https://microsoft.github.io/TaskWeaver/docs/llms/multi-llm) for more details.🔗
35+
<!-- - 📅2024-03-07: TaskWeaver now supports configuration of different LLMs for various components, such as the Planner and CodeInterpreter. Please check the [multi-llm](https://microsoft.github.io/TaskWeaver/docs/llms/multi-llm) for more details.🔗 -->
3536
<!-- - 📅2024-03-04: TaskWeaver now supports a [container](https://microsoft.github.io/TaskWeaver/docs/code_execution) mode, which provides a more secure environment for code execution.🐳 -->
3637
<!-- - 📅2024-02-28: TaskWeaver now offers a [CLI-only](https://microsoft.github.io/TaskWeaver/docs/advanced/cli_only) mode, enabling users to interact seamlessly with the Command Line Interface (CLI) using natural language.📟 -->
3738
<!-- - 📅2024-02-01: TaskWeaver now has a plugin [document_retriever](https://github.com/microsoft/TaskWeaver/blob/main/project/plugins/README.md#document_retriever) for RAG based on a knowledge base.📚 -->
@@ -43,7 +44,8 @@ Unlike many agent frameworks that only track the chat history with LLMs in text,
4344
<!-- - 📅2023-12-21: TaskWeaver now supports a number of LLMs, such as LiteLLM, Ollama, Gemini, and QWen🎈.) -->
4445
<!-- - 📅2023-12-21: TaskWeaver Website is now [available]&#40;https://microsoft.github.io/TaskWeaver/&#41; with more documentations.) -->
4546
<!-- - 📅2023-12-12: A simple UI demo is available in playground/UI folder, try it [here](https://microsoft.github.io/TaskWeaver/docs/usage/webui)! -->
46-
<!-- - 📅2023-11-30: TaskWeaver is released on GitHub🎈. -->
47+
- ......
48+
- 📅2023-11-30: TaskWeaver is released on GitHub🎈.
4749

4850

4951
## 💥 Highlights
@@ -68,7 +70,6 @@ We are looking forward to your contributions to make TaskWeaver better.
6870
- [ ] Support for prompt template management
6971
- [ ] Better plugin experiences, such as displaying updates or stopping in the middle of running the plugin and user confirmation before running the plugin
7072
- [ ] Async interaction with LLMs
71-
- [ ] Support for vision input for Roles such as the Planner and CodeInterpreter
7273
- [ ] Support for remote code execution
7374

7475

taskweaver/chat/console/chat.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -498,7 +498,7 @@ def _reset_session(self, first_session: bool = False):
498498
self.session.stop()
499499
self.session = self.app.get_session()
500500

501-
self._system_message("--- new session starts ---")
501+
self._system_message("--- new session started ---")
502502
self._assistant_message(
503503
"I am TaskWeaver, an AI assistant. To get started, could you please enter your request?",
504504
)

taskweaver/code_interpreter/code_interpreter/code_generator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -251,7 +251,7 @@ def compose_conversation(
251251
# for code correction
252252
user_message += self.user_message_head_template.format(
253253
FEEDBACK=format_code_feedback(post),
254-
MESSAGE=f"{post.get_attachment(AttachmentType.revise_message)[0]}",
254+
MESSAGE=f"{post.get_attachment(AttachmentType.revise_message)[0].content}",
255255
)
256256

257257
assistant_message = self.post_translator.post_to_raw_text(

taskweaver/code_interpreter/code_interpreter_cli_only/code_interpreter_cli_only.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,9 +60,12 @@ def reply(
6060
prompt_log_path=prompt_log_path,
6161
)
6262

63-
code = post_proxy.post.get_attachment(type=AttachmentType.reply_content)[0]
63+
code = post_proxy.post.get_attachment(type=AttachmentType.reply_content)[0].content
6464
if len(code) == 0:
65-
post_proxy.update_message(post_proxy.post.get_attachment(type=AttachmentType.thought)[0], is_end=True)
65+
post_proxy.update_message(
66+
post_proxy.post.get_attachment(type=AttachmentType.thought)[0].content,
67+
is_end=True,
68+
)
6669
return post_proxy.end()
6770

6871
code_to_exec = "! " + code

taskweaver/code_interpreter/code_interpreter_plugin_only/code_interpreter_plugin_only.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ def reply(
7878
return post_proxy.end()
7979

8080
functions = json.loads(
81-
post_proxy.post.get_attachment(type=AttachmentType.function)[0],
81+
post_proxy.post.get_attachment(type=AttachmentType.function)[0].content,
8282
)
8383
if len(functions) > 0:
8484
code: List[str] = []

taskweaver/ext_role/image_reader/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)