[Feat/Question] Optimizing context window & token efficiency for visual extraction in Chat LLM interfaces

### Description 
I am leveraging PixelRAG for a research project that involves ingesting heavily dynamic, JavaScript-rendered websites and analyzing them through multimodal chat interfaces (e.g., ChatGPT/Claude Web UI). 

While `pixelshot` does an incredible job executing client-side JS and preserving complex layout structural integrity (like tables and data graphs) via its 1568px tiled slicing strategy, passing these visual tiles directly into an LLM chat interface incurs significant vision token overhead. 

### The Problem / Question
For text-heavy segments of dynamic sites, turning pixels back into text via a Vision-Language Model (VLM) can drastically drain the available context window and maximize token consumption. 

1. Are there any existing best practices or hidden flags within the pipeline to mitigate token overhead when using a manual chat-interface workflow?
2. Has there been consideration for a hybrid approach (e.g., extracting a lightweight parallel markdown/text layout chunk alongside the screenshot tile) to give users the option of text vs. pixel delivery depending on whether the asset is a chart or a text paragraph?

### Proposed Enhancement (If applicable)
It would be highly valuable to have an option in the CLI or programmatic rendering API (e.g., `pixelshot --output-hybrid`) that outputs both the `.jpg` tile for visual assets (charts/infographics) and a stripped, markdown representation for pure structural text blocks. This would allow researchers to selectively drop text or pixels into their chat prompts, saving thousands of vision tokens.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat/Question] Optimizing context window & token efficiency for visual extraction in Chat LLM interfaces #93

Description

The Problem / Question

Proposed Enhancement (If applicable)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Feat/Question] Optimizing context window & token efficiency for visual extraction in Chat LLM interfaces #93

Description

Description

The Problem / Question

Proposed Enhancement (If applicable)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions