Skip to content

[Feat/Question] Optimizing context window & token efficiency for visual extraction in Chat LLM interfaces #93

Description

@24KaratAu

Description

I am leveraging PixelRAG for a research project that involves ingesting heavily dynamic, JavaScript-rendered websites and analyzing them through multimodal chat interfaces (e.g., ChatGPT/Claude Web UI).

While pixelshot does an incredible job executing client-side JS and preserving complex layout structural integrity (like tables and data graphs) via its 1568px tiled slicing strategy, passing these visual tiles directly into an LLM chat interface incurs significant vision token overhead.

The Problem / Question

For text-heavy segments of dynamic sites, turning pixels back into text via a Vision-Language Model (VLM) can drastically drain the available context window and maximize token consumption.

  1. Are there any existing best practices or hidden flags within the pipeline to mitigate token overhead when using a manual chat-interface workflow?
  2. Has there been consideration for a hybrid approach (e.g., extracting a lightweight parallel markdown/text layout chunk alongside the screenshot tile) to give users the option of text vs. pixel delivery depending on whether the asset is a chart or a text paragraph?

Proposed Enhancement (If applicable)

It would be highly valuable to have an option in the CLI or programmatic rendering API (e.g., pixelshot --output-hybrid) that outputs both the .jpg tile for visual assets (charts/infographics) and a stripped, markdown representation for pure structural text blocks. This would allow researchers to selectively drop text or pixels into their chat prompts, saving thousands of vision tokens.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions