Skip to content

Vision Language Model Support for Subqueries #117

@famitzsy8

Description

@famitzsy8

Recursive Vision Language Models

There is huge potential in making the RLM framework multi-modal, meaning giving it the ability to append images and other files to its prompts. Agents working with multiple 1000-page documents that contain figures, tables, maps and charts could become enormously powerful if the LLM they are connected with allows for visual processing (which most models nowadays do).

Image

Currently, I am working on this fork to make it possible such that RLMs can send images into their sub-calls. It works pretty well for now, but I think it is still premature to open a PR, for the following reasons:

@alexzhang13 & Community: Let me know if you want to see this Pull Request happen, and what needs to be discussed/resolved/implemented before

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions