Add vision caption console and encoder pipeline #97

zhongkaifu · 2025-11-14T17:28:39Z

Summary

add a VisionCaption application that turns image patches into transformer-friendly embeddings so we can train captioning models without the legacy Image2Seq code
expose the new application through a dedicated VisionCaptionConsole CLI and wire it into the solution
fix the vision corpus token accounting and add ImageSharp as a dependency for loading and normalizing images

⚠️ dotnet build Seq2SeqSharp.sln (not run: dotnet CLI is unavailable in the execution environment)

Add vision caption console and encoder pipeline

6922f7d

zhongkaifu added the codex label Nov 14, 2025 — with ChatGPT Codex Connector

zhongkaifu added 4 commits November 14, 2025 15:10

Add README for Vision Caption Console

dfca81f

Adopt ViT-style patch embeddings

16b9a90

Stabilize vision encoder signals

ec301f2

Update

b4f6bd9